Compress small lines, with what to create an external dictionary?

I want to compress a lot of small lines (about 75-100 lines in C # length). At the time the dictionary is being created, I already know all the short lines (almost a trillion). There will be no additional short lines in the future. I need to add exactly one line without unpacking the other lines.

Now I am looking for a library or a better way to do the following:

  • Create a dictionary using all the lines that I have
  • Using this dictionary to compress each line
  • a way to compress one line using a dictionary of 1.

I found a good related question , but this does not apply to C #. Maybe there is something for C # that I donโ€™t know, or a fantastic library, or someone has already done it. It is for this reason that I am asking this question.

EDIT:

With a dictionary, I am talking about such things: http://en.wikipedia.org/wiki/Dictionary_coder But everything helps to reduce the number of lines. Strings are short text messages in different languages โ€‹โ€‹and URLs (30% / 70%). Compressed lines are not required to be human readable. It will be stored in binary files.

+5
source share
2 answers

If there are a trillion lines and no more, then each can be represented in 40 bits (5 bytes). All you need is a way to use 5 bytes as an index for trillion lines.

How do you know all the trillions of lines? If the compressor and decompressor have access to all trillion rows, or if there is a way to order and recreate the rows, then you only need an index.

, . ( , , ) 32K. 400 . zlib deflateSetDictionary inflateSetDictionary , 32K. .

+1

, Smaz ...

Smaz - , . , , . , : .

Smaz , 40-50% ( ) HTML URL-. , Smaz !

, "the" .

C, Bart De Smet C #.

+1

All Articles