Baum-Welch algorithm for pos tagger

all. I use the Baum-Welch algorithm to train the pos marker, it is completely in uncontrolled mode. Here the problem arises: When I get the result of the label, I get only a sequence of numbers. I can’t understand what label VV, NN, DT stand for. How can I solve this problem?

+3
source share
1 answer

In general, there is no way to do this. Baum Welch will find classes of verbal applications that have similar distributions, but there is no reason to believe that these classes will be mapped in any simple way to the categories defined by any particular linguistic theory. Therefore, uncontrolled POS-tags are mainly useful for applications where you are interested in equivalence classes of words or phrases, but not about specific assigned tags.

If you really need human-readable shortcuts (although, during development, to assess whether the results you get are even believable), I would put a few dozen sentences. You can then apply your BW tagger to the labeled mini-body to trigger a mapping between class numbers and POS labels.

+4
source

All Articles