I am looking for the most efficient way of memory to solve this problem.
I have a list of tuples representing partial string matches in a sentence:
[(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
The first value of each tuple is the starting position for the match, the second value is the length.
The idea is to collapse the list so that only the longest string match is reported. In this case, it will be:
[(0,4), (2,6), (22,6)]
I don’t need only the longest range, for example, in the algorithm for finding the longest non-overlapping sequences , but I want all the ranges to collapse to the longest.
In case of your surprise, I use the pure Python implementation for Aho-Corasick to match terms in a static dictionary with a given piece of text.
EDIT: Due to the nature of these tuple lists, overlapping but not autonomous ranges must be printed separately. For example, with the word betazand zetathe dictionary for a match betazetaequal [(0,5),(4,8)]. Since these ranges overlap, but none are contained in the other, the answer should be [(0,5),(4,8)]. I also modified the input dataset above to cover this case.
Thank!