I have a set of search terms such as [+ dog - "jack russels" + "fox terrier"], [+ cat + persian -tabby]. They can be quite long, and perhaps 30 sub-terms make up each term.
Now I have online news articles such as ["My Fox Terrier is the cutest dog in the world ..."] and ["Has anyone seen my lost Persian cat? Missing ..."]. They are not too long, possibly no more than 500 characters.
Traditional search engines expect a huge number of articles that are pre-processed into indexes, which allows you to speed up the search for specified "search terms" using set theory theory / Boolean logic to reduce the number of articles only for those that match the phrase. However, in this situation, the order of my search queries is ~ 10 ^ 5, and I would like to process one article at a time to see ALL the many search terms with which the article will be matched (i.e. all + terms are in the text and not one from the terms - ).
I have a possible solution using two cards (one for positive subphrases, one for negative subphrases), but I do not think it will be very effective.
The first prize is a library that solves this problem, the second prize is a push in the right direction to solve it.
Yours faithfully,
source
share