How to parse .pdf files in Perl?

How to parse .pdf files in Perl? Is perl more efficient or should I use any other language?

+3
source share
4 answers

I personally use CAM :: PDF.

my $doc=CAM::PDF->new($fileName) || die "$CAM::PDF::errStr\n"; CAM::PDF>asciify(/$pdfString);`

Pdfs are not intended for parsing, but for display / printing - in this way, everything always tries and error, and it is quite possible that it is impossible to parse if everything is graphics.
A good indicator is if you can copy and paste the contents from pdf into the editor. If this works, then you are in business.

+6
source

PDF, pdftohtml ( Poppler) -xml . XML , XML::Twig ( XML, , XML:: Simple).

XML . <page> PDF, <fontspec>, , <text> . <text> <b> <i> ( XML:: Simple ).

top left <text>, , . 0,0 . PostScript (72 ).

+9

CPAN , , OCR, . PDF:: OCR2

+5

I do not know a single module that analyzes, that is, if you extract text from them. There are several modules that allow you to manipulate them. Try PDF :: API2 .

+4
source

All Articles