Is it possible to digitize a dictionary?

I found the public domain latin ↔ Portuguese dictionary in PDF format, which I would like to convert to plain text, analyze and use as a program database. However, after some tests, I'm a little skeptical. Take a look at the original file and the resulting gocr text . Is there any hope that I can achieve 99% + accuracy in some method? I was thinking about the reCaptcha database, but I think this is a Google property, right?

Thank!

+3
source share
2 answers

Another way is to use one of the freely available dictionary files, for example http://www.brothersoft.com/downloads/dictionary-database.html

+2
source

Or WordNet .

EDIT: I just noticed that this is a Latin / Portuguese dictionary, so WordNet is clearly not good.

+2
source

All Articles