Hamming distance optimization for MySQL or PostgreSQL?

I am trying to improve the search for similar images measured in a MySQL database. Right now I am comparing the pHash value, calculating the distance for hamming as follows:

SELECT * FROM images WHERE BIT_COUNT(hash ^ 2028359052535108275) <= 4

Results for selection (MyISAM engine)

  • 20,000 lines request time <20ms
  • 100,000 lines request time ~ 60 ms # it was just fine until it reached 150,000 lines
  • 300,000 lines request time ~ 150 ms

Thus, the query time depends on the number of rows in the table.


I am also trying to find the solutions found in stackoverflow. Hamming distance in binary strings in SQL

SELECT * FROM images WHERE 
BIT_COUNT(h1 ^ 11110011) + 
BIT_COUNT(h2 ^ 10110100) + 
BIT_COUNT(h3 ^ 11001001) + 
BIT_COUNT(h4 ^ 11010001) + 
BIT_COUNT(h5 ^ 00100011) + 
BIT_COUNT(h6 ^ 00010100) + 
BIT_COUNT(h7 ^ 00011111) + 
BIT_COUNT(h8 ^ 00001111) <= 4

lines 300,000; request time ~ 240 ms


PostgreSQL. MySQL PyGreSQL . 300000; ~ 18


- ? , .

() . MySQL , , Ruby . MsSQL qaru.site/questions/1145978/... ( ). , - , MySQL PostgreSQL.

, . hamming stackoverflow.com

!

+5
2

, , O (-), - n, , . , , :

  • O (1) -
  • O (log (n)) -
  • O (n) - ( )
  • O (n ^ 2) -
  • O (n ^ 3) - etc
  • O (2 ^ n) -
  • O (n!) -

2 n (80 +).

, n, n ^ 2 65 * n ^ 2 + 787 * n + 4656566 O (n ^ 2)

, , , (, O (n ^ 2) , O (n) ).

BIT_COUNT(hash ^ 2028359052535108275) <= 4. O (n).

- , b- - O (log (n)).

, , . 2 :

  • SQL-, , MySQL. BIT_COUNT(hash ^ 2028359052535108275) . , .
  • BIT_COUNT.
+3

. - , . , BIT_COUNT pHash, . 2,25 2,6 .

InnoDB, .

- , .

SELECT *, BIT_COUNT(pHash3 ^ 42597524) + BC2 AS BC3 
FROM ( 
    SELECT *, BIT_COUNT(pHash2 ^ 258741369) + BC1 AS BC2 
    FROM ( 
        SELECT *, BIT_COUNT(pHash1 ^ 5678910) + BC0 AS BC1 
        FROM ( 
            SELECT `Key`, pHash0, pHash1, pHash2, pHash3, BIT_COUNT(pHash0 ^ 1234567) as BC0 
            FROM files 
            WHERE  BIT_COUNT(pHash0 ^ 1234567) <= 3 
        ) AS BCQ0 
        WHERE BIT_COUNT(pHash1 ^ 5678910) + BC0 <= 3 
    ) AS BCQ1 
    WHERE BIT_COUNT(pHash2 ^ 258741369) + BC1 <= 3 
    ) AS BCQ2 
WHERE BIT_COUNT(pHash3 ^ 42597524) + BC2 <= 3

, . 3 .

SELECT `Key`, pHash0, pHash1, pHash2, pHash3 
FROM Files 
WHERE BIT_COUNT(pHash0 ^ 1234567) + BIT_COUNT(pHash1 ^ 5678910) + BIT_COUNT(pHash2 ^ 258741369) + BIT_COUNT(pHash3 ^ 42597524) <=3

, , .

+2

All Articles