Get specific nodes from the analysis tree

I am working on a project with anaphora resolution using the Hobbs algorithm. I analyzed my text using the Stanford analyzer, and now I would like to manipulate the nodes to implement my algorithm.

At the moment, I do not understand how:

  • Access to a node based on its POS tag (for example, I need to start with a pronoun - how can I get all pronouns?).

  • Use visitors. I am a little Java noob, but in C ++ I needed to implement the Visitor functor and then work on its intercepts. Although I could not find much for the structure of the Stanford parser. Is it jgrapht? If so, could you provide me some pointers to code snippets?

+3
source share
2 answers

@dhg , , :

  • Tree Iterable. Tree , , , node, :

    for (Tree subtree : t) { 
        if (subtree.label().value().equals("PRP")) {
            pronouns.add(subtree);
        }
    }
    
  • , ( ), tregex, java.util.regex, . - :

    TregexPattern tgrepPattern = TregexPattern.compile("PRP");
    TregexMatcher m = tgrepPattern.matcher(t);
    while (m.find()) {
        Tree subtree = m.getMatch();
        pronouns.add(subtree);
    }
    
+10

, .

private static ArrayList<Tree> findPro(Tree t) {
    ArrayList<Tree> pronouns = new ArrayList<Tree>();
    if (t.label().value().equals("PRP"))
        pronouns.add(t);
    else
        for (Tree child : t.children())
            pronouns.addAll(findPro(child));
    return pronouns;
}

public static void main(String[] args) {

    LexicalizedParser parser = LexicalizedParser.loadModel();
    Tree x = parser.apply("The dog walks and he barks .");
    System.out.println(x);
    ArrayList<Tree> pronouns = findPro(x);
    System.out.println("All Pronouns: " + pronouns);

}

:

    (ROOT (S (S (NP (DT The) (NN dog)) (VP (VBZ walks))) (CC and) (S (NP (PRP he)) (VP (VBZ barks))) (. .)))
    All Pronouns: [(PRP he)]
+5

All Articles