Collapse list of range sets to overlapping ranges

Question

Collapse list of range sets to overlapping ranges

I am looking for the most efficient way of memory to solve this problem.

I have a list of tuples representing partial string matches in a sentence:

[(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]

The first value of each tuple is the starting position for the match, the second value is the length.

The idea is to collapse the list so that only the longest string match is reported. In this case, it will be:

[(0,4), (2,6), (22,6)]

I don’t need only the longest range, for example, in the algorithm for finding the longest non-overlapping sequences , but I want all the ranges to collapse to the longest.

In case of your surprise, I use the pure Python implementation for Aho-Corasick to match terms in a static dictionary with a given piece of text.

EDIT: Due to the nature of these tuple lists, overlapping but not autonomous ranges must be printed separately. For example, with the word betazand zetathe dictionary for a match betazetaequal [(0,5),(4,8)]. Since these ranges overlap, but none are contained in the other, the answer should be [(0,5),(4,8)]. I also modified the input dataset above to cover this case.

Thank!

+5

python tuples

Francisco roque May 28 '12 at 21:06

source share

3 answers

a = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
b = [set(xrange(i, i + j)) for i, j in a]
c = b.pop().union(*b)
collapsed = sorted(c)
print collapsed
#Maybe this is useful?:
[0, 1, 2, 3, 22, 23, 24, 25, 26, 27]

#But if you want the requested format, then do this:
d = []
start = collapsed[0]
length = 0
for val in collapsed:
    if start + length < val:
        d.append((start,length))
        start = val
        length = 0
    elif val == collapsed[-1]:
        d.append((start,length + 1))
    length += 1
print d
#Output:
[(0,4), (22,6)]

+1

fraxel May 28 '12 at 23:41

source share

, , - , , :

lst = [(0, 2), (1, 2), (0, 4), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort()
start, length = lst.pop(0)
i = 0
while i < len(lst):
    x, l = lst[i]
    if start + length < x:
        lst[i] = (start, length)
        i += 1
        start, length = x, l
    else:
        length = max(length, x + l - start)
        lst.pop(i)
lst.append((start, length))

, , , ,

Perhaps there is a much faster algorithm if you do not want to change the list in place - sending items from the middle of the list can be slow, especially if the list is long.

One reasonable optimization would be to save the list of indexes that you are about to delete, and then go back and rebuild the list in the second pass, so you could rebuild the entire list in one go and avoid popoverhead. But it will use more memory!

-1

Julian May 28 '12 at 23:06

source share

Janne karila · Accepted Answer · 2012-05-29T08:54:56+0000

import operator
lst = [(0, 2), (1, 2), (0, 4), (2,6), (23, 2), (22, 6), (26, 2), (26, 2), (26, 2)]
lst.sort(key=operator.itemgetter(1))
for i in reversed(xrange(len(lst)-1)):
    start, length = lst[i]
    for j in xrange(i+1, len(lst)):
        lstart, llength = lst[j]
        if start >= lstart and start + length <= lstart + llength:
            del lst[i]
            break
print lst
#[(0, 4), (2, 6), (22, 6)]

Collapse list of range sets to overlapping ranges

More articles: