Cross-entropy for modeling languages

im currently working on a classification task using language modeling. The first part of the project involved using n-grammar language models to classify documents using c5.0. The final part of the project requires me to use cross-entropy to model each class and classify test cases against these models.

Does anyone have experience using cross entropy or links to information on how to use the cross entropy model for sample data? Any information at all would be great! Thanks

+3
source share
1 answer

You can get a theoretical background on the use of cross-entropy with language models in various textbooks, for example. "Speech and Language Processing" by Jurafsky and Martin, pages 116-118 in the second edition. As for the specific use, in most language modeling tools, cross-entropy is not directly measured, but "Excellence", which is the result of cross-entropy. Complexity, in turn, can be used to classify documents. see, for example, the documentation for the “evallm” team in the UDF, Carnegie-Melon University language modeling tools (http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html)

good luck :)

+1
source

All Articles