When I want to extract text from a PDF, I pass it pdftohtml(part of Poppler ) using -xmloutput. This creates an XML file that I parse using XML :: Twig (or any other XML parser that you like except XML :: Simple).
XML . <page> PDF, <fontspec>, , <text> . <text> <b> <i> ( XML:: Simple ).
top left <text>, , . 0,0 . PostScript (72 ).