Search for "subject" from an array of part of speech tags

I know this question is more of a grammar question, but how do you define the “subject” of a sentence if you have an array of tokens Penn Treebank, such as:

[WP][VBZ][DT][NN]

Is there any java library that can take such markers and determine which one is the subject? Or which ones?

+3
source share
3 answers

I have successfully classified topics for Portuguese using OpenNLP. I created a shallow parser by slightly modifying the OpenNLP Chunker component.

OpenNLP pos, , PoS + .

Chunker Conll 2000:

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP
current   JJ   I-NP
account   NN   I-NP
deficit   NN   I-NP
will      MD   B-VP
narrow    VB   I-VP
...

,

He        PRP+B-NP  B-SUBJ
reckons   VBZ+B-VP  B-V  
the       DT+B-NP   O
current   JJ+I-NP   O
account   NN+I-NP   O
deficit   NN+I-NP   O
will      MD+B-VP   O
narrow    VB+I-VP   O

Penn Treebank, , . , Perl script, CoNLL-2000.

87,07% , 75,48% 80,86% F1.

0

, , . . . http://en.wikipedia.org/wiki/Parse_tree .

, , ().

, , (POS) , , POS, , .

, . , 480 , , .

( Java).

The Berkeley Parser (http://code.google.com/p/berkeleyparser/). (3-5 ).

BUBS (http://code.google.com/p/bubs-parser/) , ( 1,5 F1- , ), 50-80 . - , .

: - . , BUBS . , , , , ..

+1

The free, Java-based Stanford Dependency Parser (part of the Stanford Parser) makes this trivial. It creates a dependency parsing tree with dependencies, such as nsubj(makes-8, Bell-1)telling you what Bellthe subject is makes. All you have to do is look at the list of dependencies that the parser gives you to search for entries nsubjor nsubjpass, and these are the objects of the verbs.

+1
source

All Articles