Due to the reasons outlined in this question , I am creating my own client-side search engine, rather than using a library ydn-full-textbased on fullproof. That it boils down to fullproofgenerating “too many entries” of 300,000 entries, while (after graduation) there are only about 7,700 unique words. Therefore, my “theory” is that full protection is based on traditional assumptions that apply only to the server side:
- Huge indexes are good
- High processor power
- (and the assumption of working with longer records that apply only to my case, since my records contain on average only 24 words 1 )
While on the client side:
- Huge indices take age to fill.
- Processing power is still limited, but relatively cheaper than server-side.
Based on these assumptions, I started with an elementary inverted index (giving a total of 7,700 records, since it IndexedDBis a document / nosql database). This inverted index was created using Lancer's stanmer (the most aggressive of two or three popular ones), and during the search I would get an index for each word, assign a rating based on the overlap of different indices and the similarity of the printed word compared to the original (distance Jaro-Winkler).
The problem with this approach:
- The combination of "popular_word + popular word" is extremely expensive
, , : ? , , , , -, . ( , )
1 , , . , "" fullproof. , , , , .