Check for rows in rows

I have a huge list containing many lines, for example:

['xxxx','xx','xy','yy','x',......]

Now I am looking for an effective way to delete all the lines that are present on another line. For example, 'xx' 'x' fits into 'xxxx'.

Since the dataset is huge, I was wondering if there is an effective method for doing this next to

if a in b:

Full code: Perhaps some parts of optimization:

for x in range(len(taxlistcomplete)):
if delete == True:
    x = x - 1
    delete = False
for y in range(len(taxlistcomplete)):
    if taxlistcomplete[x] in taxlistcomplete[y]:
        if x != y:
            print x,y
            print taxlistcomplete[x]
            del taxlistcomplete[x]
            delete = True
            break
    print x, len(taxlistcomplete)

Updated version of the code:

for x in enumerate(taxlistcomplete):
if delete == True:
    #If element is removed, I need to step 1 back and continue looping.....
    delete = False
for y in enumerate(taxlistcomplete):
    if x[1] in y[1]:
        if x[1] != y[1]:
            print x[1],y[1]
            print taxlistcomplete[x]

            del taxlistcomplete[x[0]]
            delete = True
            break
print x, len(taxlistcomplete)

Now implemented with an enumeration, only now I wonder if this is more efficient and how to implement the delete step, so I have less search.

Just a short thought ...

Basically, what I would like to see ...

, . , "xxxxx" "xx", "xy", "wfirfj" ..... print/save

, , ...

print 'comparison'

file = open('output.txt','a')

for x in enumerate(taxlistcomplete):
    delete = False
    for y in enumerate(taxlistcomplete):
        if x[1] in y[1]:
            if x[1] != y[1]:
                taxlistcomplete[x[0]] = ''
                delete = True
                break
    if delete == False:
        file.write(str(x))
+5
4

x in <string> , O (n ^ 2). , : .

, " " ( trie), , , . . , x, x, x[1:], x[2:], x[3:], .. (So: n n). , 0, 1, 2 .. . , , - .

O (n) :

  • . , , . , O (n) .

  • . x , , , . , x, x[1:], x[2:] .. .

, , , ( , ). , .

, , . ( ) , . 50% . ( ) .

+9

, , ( '$' ), :

result = ''
for substring in taxlistcomplete:
    if substring not in result: result += '$' + substring
taxlistcomplete = result.split('$')

Python , :)

+2

- in - :

[element for element in arr if 'xx' in element]
0

Here is my suggestion. First, I sort the elements by length. Since, obviously, the shorter the string, the more likely it is to be a substring of another string. Then I have two loops where I look through the list and remove each item from the list, where el is a substring. Note that the first for loop only passes one element.

By sorting the list first, we will destroy the order of the items in the list. Therefore, if order is important, you cannot use this solution.

Change I assume that the list does not have the same elements. So when el == el2, it is because the same element.

a = ["xyy", "xx", "zy", "yy", "x"]
a.sort(key=len)

for el in a:
    for el2 in a:
        if el in el2 and el != el2:
            a.remove(el2)
0
source

All Articles