Any way to reduce text size?

Description: I have a huge MySQL database table. The total size is about 10 terabytes. It contains only texts.

Sample text from this database table:

In other cases, some countries have gradually learned to produce the same products and services that previously only the United States and some other countries could produce. U.S. real income growth slowed.

There are about 50 billion different texts.

What have i tried?

I tried to button them. Actually it worked, reduced the overall size. However, I need to perform a search, and I cannot search for any data while it is in the zip file.

I tried PHP encoding base64. He made my text data as follows:

SW4gb3RoZXIgY2FzZXMsIHNvbWUgY291bnRyaWVzIGhhdmUgZ3JhZHVhbGx5IGxlYXJuZW QgdG8gcHJvZHVjZSB0aGUgc2FtZSBwcm9kdWN0cyBhbmQgc2VydmljZXMgdGhhdCBwcmV2 aW91c2x5IG9ubHkgdGhlIFUuUy4gYW5kIGEgZmV3IG90aGVyIGNvdW50cmllcyBjb3VsZC Bwcm9kdWNlLiBSZWFsIGluY29tZSBncm93dGggaW4gdGhlIFUuUy4gaGFzIHNsb3dlZC4 =

What would I like to do?

I want to reduce the size of the text before sending them to MySQL. First of all, I do not know how I can handle this work. I am thinking about encrypting and decrypting data.

So, here is an example I want to do:

I want to encrypt text data before saving. Then I want to call the encrypted data from MySQL for decryption.

How to reduce text size? Base64 doesn't work for me, is there any other way?

+5
source share
4 answers

, base64 encryption . , , , , gzcompress gzdeflate

$original = "In other cases, some countries have gradually learned to produce the same products and services that previously only the U.S. and a few other countries could produce. Real income growth in the U.S. has slowed." ;
$base64 = base64_encode($original);
$compressed = base64_encode(gzcompress($original, 9));
$deflate = base64_encode(gzdeflate($original, 9));
$encode = base64_encode(gzencode($original, 9));


$base64Length = strlen($base64);
$compressedLength = strlen($compressed) ;
$deflateLength  = strlen($deflate) ;
$encodeLength  = strlen($encode) ;

echo "<pre>";
echo "Using GZ Compress   =  " , 100 - number_format(($compressedLength / $base64Length ) * 100 , 2)  , "% of Improvement", PHP_EOL;
echo "Using Deflate       =  " , 100 - number_format(($deflateLength / $base64Length ) * 100 , 2)  , "% of Improvement", PHP_EOL;
echo "</pre>";

Using GZ Compress   =  32.86%  Improvement
Using Deflate       =  35.71%  Improvement
+11

Base64 , . gzip (http://php.net/manual/en/function.gzcompress.php), , MySQL.

+3

, ! ( , !)... 10 , MySQL !

, - , , .

script 50 , words , , . , I am piece of large text. :

[1: piece][2: large][3: text]

I'm the next large part!:

[4: next][2: large][5: part]

, I, am, of, I'm, the plus ., ! , keyword-based. , .

. md5 . id -.

texts keywords . many-to-many, :

[text_id][text]
1 -> I am piece of large text.
2 -> I'm the next large part!

[keyword_id][keyword]
1 -> piece
2 -> large
3 -> text
4 -> next
5 -> part

[keyword_id][text_id]
1 -> 1
2 -> 1
3 -> 1
4 -> 2
2 -> 2
5 -> 2

, ( MySQL!), - large text!

, 50,000 60,000 600,000 - 700,000, . , , 50 000 , 10 TB .

, , , -!:)

+2

Although both answers relate to the question and provide text compression options, I think that compression will help solve your problem. Finding large amounts of data has never been the goal of relational databases such as MySQL.

You have a very good tip for Apache Lucene, and there are other options like Sphinxsearch. Here is a quick thread compared:

Comparison of the full-text search engine - Lucene, Sphinx, Postgresql, MySQL?

+1
source

All Articles