c# - Compress small strings, with what to create external dictionary? -
i want compress small strings (about 75-100 length c# string). @ time dictionary created know short strings (nearly trillion). there no additional short strings in future. need 1 string without decompress other strings.
now looking library or best way following:
- create dictionary using strings have
- using dictionary compress each string
- a way compress 1 string using dictionary 1.
i found good related question, not c# specific. maybe there c# not know, or fancy library or has done that. reason ask question.
edit:
with dictionary talking things this: http://en.wikipedia.org/wiki/dictionary_coder helps strings shorter. strings short text messages in various languages , urls (30%/70%). there no need compressed strings human readable. stored in binary files.
if there trillion strings , no more, each can represented in 40 bits (5 bytes). need way use 5-bytes index trillion strings.
how know trillion strings? if compressor , decompressor both have access trillion strings, or if there way order , recreate strings, need index.
if can't find way index strings, can take subset of strings , use them dictionary compressor. take representative sample (you need figure out might make of strings more common other strings or more representative of other strings) , concatenate them 32k dictionary. 400 of trillion strings. zlib's deflatesetdictionary on compress end , inflatesetdictionary on decompress end, both using same 32k dictionary. provide compression on short strings.
Comments
Post a Comment