Try using hash tables. Another thing that can be done is any method like MAP-REDUCE. I want to say that you can try using an inverted index. Google uses the same technique. All you can do is create a stop word file where you can put words that can be ignored, for example. I, am, a, a, an, in, on, etc.
This is the only thing that I think is possible. I read somewhere that you can search for arrays.
source
share