I'm trying to understand Viola Jones's method, and I basically got it.
It uses simple Haar functions, reinforced by strong classifiers and organized into layers / cascades to achieve better results (do not worry about explicit “non-object” areas).
I think I understand a holistic image, and I understand how computed values are for functions.
The only thing I can’t understand is how the algorithm deals with face size changes.
As far as I know, they use a 24x24 subwindow that slides across the image, and inside it the algorithm goes through classifiers and tries to find out if it has a face / object or not.
And my question is: what if one face is 10x10 and the other 100x100? What happens next?
And I am dying to find out what these first two functions are (in the first layer of the cascade), how they look (bearing in mind that these two functions, depending on Viola & Jones, almost never miss a face, and eliminate 60% of the wrong )? How??
And how can you build these functions to work with these statistics for different face sizes in the image?
Am I missing something, or maybe I misunderstood everything?
If I am not clear enough, I will try to better explain my confusion.
source
share