Search solution for similar images

I have a really big problem with my image storage server.

It has about 2 million product images and continues to grow, but many of them are very similar. For example: iPad photo with many similar sizes 120 * 120, 118 * 120, 131 * 125 ... etc. My site had a lot of unnecessary disk space and a poor user interface (similar images in the gallery).

These images are indexed in the database, I can find them with some conditions, for example, by product, category, etc. I need to find a way to mark these similar images in the database and delete them.

What I did: a found library called pHash can calculate two similarities of an image, I can use it to calculate images one by one. But thus, it will take a long time to find these images. Now I do not know how to make this process faster.

Any ideas?

+3
source share
2 answers
  • Use pHash to calculate the perceptual hash of all of your images (and not to cross-play each combination),
  • then sort the hash (keeping the link to the images),
  • then determine the critical value of this perceptual hash, which you define as “images are equivalent”,
  • then replace links to equivalent images with a link to the single image you want to save.
+3
source

, O(n^2), n-size.

, , canopy , , "", .

, ( , ).

w, , w < .

, , w . "" .

O(w * n), w.

, , .

similar .

: .

+3

All Articles