Search solution for similar images

Question

Search solution for similar images

I have a really big problem with my image storage server.

It has about 2 million product images and continues to grow, but many of them are very similar. For example: iPad photo with many similar sizes 120 * 120, 118 * 120, 131 * 125 ... etc. My site had a lot of unnecessary disk space and a poor user interface (similar images in the gallery).

These images are indexed in the database, I can find them with some conditions, for example, by product, category, etc. I need to find a way to mark these similar images in the database and delete them.

What I did: a found library called pHash can calculate two similarities of an image, I can use it to calculate images one by one. But thus, it will take a long time to find these images. Now I do not know how to make this process faster.

Any ideas?

+3

comparison image

opps Mar 10 '11 at 9:18

source share

2 answers

, O(n^2), n-size.

, , canopy , , "", .

, ( , ).

w, , w < .

, , w . "" .

O(w * n), w.

, , .

similar .

: .

+3

Mahmoud Abdelkader 10 . '11 9:52

Bernd elkemann · Accepted Answer · 2011-03-10T09:29:12+0000

Use pHash to calculate the perceptual hash of all of your images (and not to cross-play each combination),
then sort the hash (keeping the link to the images),
then determine the critical value of this perceptual hash, which you define as “images are equivalent”,
then replace links to equivalent images with a link to the single image you want to save.

Search solution for similar images

More articles: