Removing the python nltk keyword from a sentence

"The first thing we do is let all the lawyers be killed." - William Shakespeare

Considering the above quotation, I would like to highlight "kill", and "lawyers"as the two key words to describe the general meaning of a sentence. I extracted the following POS noun / verb tags:

[["First", "NNP"], ["thing", "NN"], ["do", "VBP"], ["lets", "NNS"], ["kill", "VB"], ["lawyers", "NNS"]]

The more general problem I'm trying to solve is to translate the sentence into the “most important” words / tags in order to summarize the overall “meaning” of the sentence.

* pay attention to quotation marks. I admit that this is a very difficult problem, and at the moment, most likely, there is no perfect solution. Nevertheless, I am interested to see attempts to solve a specific problem (extracting "kill"and "lawyers") and a general problem (summing up the general meaning of the sentence in keywords / tags)

+5
source share
3 answers

One simple approach would be to save stop words for NN, VB, etc. These would be high-frequency words that usually do not add much semantic content to a sentence.

, - (, ).

stop_words = dict(
    NNP=['first', 'second'],
    NN=['thing'],
    VBP=['do','done'],
    VB=[],
    NNS=['lets', 'things'],
)


def filter_stop_words(pos_list):
    return [[token, token_type] 
            for token, token_type in pos_list 
            if token.lower() not in stop_words[token_type]]
+2

, , /, . , ( "", "" ), - , ( "", "", "" ). , , , .

, , . idf, .. , , . n-grams.

idf POS, " ?", " ", .. "" , "", "" , "", , , , . , , , .

n-gram idf, (, , stanford parser) , , , , ..

+2

in your case, you can just use the Rake (thanks to Fabian) package for python to get what you need:

>>> path = #your path 
>>> r = RAKE.Rake(path)
>>> r.run("First thing we do, let kill all the lawyers")
[('lawyers', 1.0), ('kill', 1.0), ('thing', 1.0)]

The path may be, for example, this file .

but overall you better use the NLTK package to use NLP

0
source

All Articles