Perl pdf by line parser?

I have a pdf, consists only of text, without special characters, images, etc. Is there any Perl module (looked at cpan to no avail) to help me sort each page line by line? (Converting PDF to text gives poor results and inappropriate data)

Thank,

+2
source share
1 answer

When I want to extract text from a PDF, I pass it pdftohtml(part of Poppler ) using -xmloutput. This creates an XML file that I parse using XML :: Twig (or any other XML parser that you like except XML :: Simple).

XML . <page> PDF, <fontspec>, , <text> . <text> <b> <i> ( XML:: Simple ).

top left <text>, , . 0,0 . PostScript (72 ).

+6

All Articles