Extract text from output syntax tree

I'm new to nlp, I'm trying to use the stanford parser to extract sentences (NP) from text, I want to get the parts of the text where it is marked (NP)

if the part is marked (NP) and the smaller part inside it is also marked (NP), I want to take the smaller part.

so far I have managed to do what I wanted in the following method:

private static ArrayList<Tree> extract(Tree t) 
{
    ArrayList<Tree> wanted = new ArrayList<Tree>();
   if (t.label().value().equals("NP") )
    {
       wanted.add(t);
        for (Tree child : t.children())
        {
            ArrayList<Tree> temp = new ArrayList<Tree>();
            temp=extract(child);
            if(temp.size()>0)
            {
                int o=-1;
                o=wanted.indexOf(t);
                if(o!=-1)
                    wanted.remove(o);
            }
            wanted.addAll(temp);
        }
    }

    else
        for (Tree child : t.children())
            wanted.addAll(extract(child));
    return wanted;
}

The return type of this method is a list of trees. When I do the following:

     LexicalizedParser parser = LexicalizedParser.loadModel();
        x = parser.apply("Who owns club barcelona?");
     outs=extract(x);
    for(int i=0;i<outs.size();i++){System.out.println("tree #"+i+": "+outs.get(i));}

:

tree #0: (NP (NN club) (NN barcelona))

I want the result to be "club barcelona"immediately, without tags, I tried the property .labels();and .label().value();instead returned the tags

+5
source share
1 answer

You can get a list of words under the tr subtree with

tr.yield()

String Sentence:

Sentence.listToString(tr.yield())

, , , , , NP, NP . , , :

Tree x = lp.apply("Christopher Manning owns club barcelona?");
TregexPattern NPpattern = TregexPattern.compile("@NP !<< @NP");
TregexMatcher matcher = NPpattern.matcher(x);
while (matcher.findNextMatchingNode()) {
  Tree match = matcher.getMatch();
  System.out.println(Sentence.listToString(match.yield()));
}
+10

All Articles