I have a huge list containing many lines, for example:
['xxxx','xx','xy','yy','x',......]
Now I am looking for an effective way to delete all the lines that are present on another line. For example, 'xx' 'x' fits into 'xxxx'.
Since the dataset is huge, I was wondering if there is an effective method for doing this next to
if a in b:
Full code: Perhaps some parts of optimization:
for x in range(len(taxlistcomplete)):
if delete == True:
x = x - 1
delete = False
for y in range(len(taxlistcomplete)):
if taxlistcomplete[x] in taxlistcomplete[y]:
if x != y:
print x,y
print taxlistcomplete[x]
del taxlistcomplete[x]
delete = True
break
print x, len(taxlistcomplete)
Updated version of the code:
for x in enumerate(taxlistcomplete):
if delete == True:
delete = False
for y in enumerate(taxlistcomplete):
if x[1] in y[1]:
if x[1] != y[1]:
print x[1],y[1]
print taxlistcomplete[x]
del taxlistcomplete[x[0]]
delete = True
break
print x, len(taxlistcomplete)
Now implemented with an enumeration, only now I wonder if this is more efficient and how to implement the delete step, so I have less search.
Just a short thought ...
Basically, what I would like to see ...
, .
, "xxxxx" "xx", "xy", "wfirfj" ..... print/save
, , ...
print 'comparison'
file = open('output.txt','a')
for x in enumerate(taxlistcomplete):
delete = False
for y in enumerate(taxlistcomplete):
if x[1] in y[1]:
if x[1] != y[1]:
taxlistcomplete[x[0]] = ''
delete = True
break
if delete == False:
file.write(str(x))