I have a php script that checks the hamming distance between two photos taken from a security camera.
MySQL table with numbers 2.4M and consists of a key and 4 INT (10) s. INT (10) s were indexed separately together with the key, but I have no substantial evidence that any combination was faster than the others. I can try again if you offer to do this.
The interference weights are calculated by converting the image to 8x16 pixels, and every quarter bit is stored in a column, pHash0, pHash1 ... etc.
There are two ways I wrote this. The first way is to use nested views. Theoretically, each output should have less data to verify than its predecessor. A request is a prepared statement, huh? fields is the pHash [0-3] of the file that I am checking.
Select
`Key`,
Bit_Count(T3.pHash3 ^ ?) + T3.BC2 As BC3
From
(Select
*,
Bit_Count(T2.pHash2 ^ ?) + T2.BC1 As BC2
From
(Select
*,
Bit_Count(T1.pHash1 ^ ?) + T1.BC0 As BC1
From
(Select
`Key`,
pHash0,
pHash1,
pHash2,
pHash3,
Bit_Count(pHash0 ^ ?) As BC0
From
files
Where
Not pHash0 Is Null And
Bit_Count(pHash0 ^ ?) < 4) As T1
Where
Bit_Count(T1.pHash1 ^ ?) + T1.BC0 < 4) As T2
Where
Bit_Count(T2.pHash2 ^ ?) + T2.BC1 < 4) As T3
Where
Bit_Count(T3.pHash3 ^ ?) + T3.BC2 < 4
The second approach was a bit more direct. He just did all the work right away.
Select
`Key`,
From
files
Where
Not pHash0 is null AND
Bit_Count(pHash0 ^ ?) + Bit_Count(pHash1 ^ ?) + Bit_Count(pHash2 ^
?) + Bit_Count(pHash3 ^ ?) < 4
The first query is faster on large sets of records, and the second on smaller sets of records, but none of them will exceed 1-1 / 3 seconds for comparison in 2.4M records.
Do you see a way to customize this process to make it faster? Any sentences can be checked quickly, for example, to change data types or indexes.
- Win7x64, MySQL/5.6.6 InnoDB, nginx/1.99, php-cgi/7.0.0 zend. script - .
EDIT:
, 4 32- 1 (16), 4 , 4 128- , php . , .
~ 500%. : pHash "A" pHash "B" +/- .
@duskwuff . @duskwuff!
:
Select
files.`Key`,
Bit_Count(? ^ pHash0) + Bit_Count(? ^ pHash1) +
Bit_Count(? ^ pHash2) + Bit_Count(? ^ pHash3) as BC
From
files FORCE INDEX (bitcount)
Where
bitCount Between ? And ?
AND Bit_Count(? ^ pHash0) + Bit_Count(? ^ pHash1) +
Bit_Count(? ^ pHash2) + Bit_Count(? ^ pHash3) <= ?
ORDER BY Bit_Count(? ^ pHash0) + Bit_Count(? ^ pHash1) +
Bit_Count(? ^ pHash2) + Bit_Count(? ^ pHash3)
4 "?" 4 32- , 2 "?" +/- "?" . ORDER BY , LIMIT 1 . bitcount B-TREE.
2,4 , 3 4 , 70 000 . 64 ( ), 3 20% (490 000 ), 0 2,8% (70 000, ).