If I have a list of words, how can I check if a string contains any of the words in the list and is it effective?

As the title says, I have a list of words, for example stopWords = ["the", "and", "with", etc...], and I get text like "Kill the fox and the dog." I want a result like "Kill fox dog" to be very efficient and fast. How can I do this (I know that I can iterate with a for loop, but this is not very efficient)

+5
source share
6 answers

The most important improvement is to make stopWords aset . This means that the search will be very fast.

stopWords = set(["the", "and", "with", etc...])
" ".join(word for word in msg.split() if word not in stopWords)

If you just want to find out if any of the stopWords characters are in the text

if any(word in stopWords for word in msg.split()):
    ...
+8

Python "-" "x ". .

.

+3

:

stopWords = ["the", "and", "with"]
msg = "kill the fox and the dog"

' '.join([w for w in msg.split() if w not in stopWords])

:

'kill fox dog'
+1
  • .
  • , . .
0

set() ( ), , , working = working - stopWords..., stopWords . . :

#!python
stopWords = set('the a an and'.split())
working   = set('this is a test of the one working set dude'.split())
if working == working - stopWords:
    print "The working set contains no stop words"
else:
    print "Actually, it does"

, trie, , . trie- Python, (C) , , trie, Python, Python set(). ( Cython, ).

, , - SO: python cython.

Ultimately, of course, you have to create a simple version based on the set, check and profile it, and then try trie and Cython-trie options as possible improvements if necessary.

0
source

Alternatively, you can put together your list in regular expression and replace the stop words along with the surrounding spaces with a single space.

import re
stopWords = ["the", "and", "with"]
input = "Kill the fox and dog"
pattern = "\\s{:s}\\s".format("\\s|\\s".join(stopWords))
print(pattern)
print(re.sub(pattern, " ", input))

displays

\sthe\s|\sand\s|\swith\s
Kill fox dog
0
source

All Articles