Script to search for text from PDF

Question

Script to search for text from PDF

Problem

On a Mac OS X platform, I would like to write a script, either in Python or in Tcl, to search for text in a PDF file and extract the relevant parts. I appreciate any help.

Background

I write scripts to look at the PDF to determine if it is a bill of exchange, which company and for what period. Based on this information, I will rename the PDF and move it to the appropriate directory. For example, a file such as Statement_03948293929384.pdfcan become 2012-07-15 Water Bill.pdfand is transferred to my folder Utilities.

What have i done so far?

I searched for PDF-to-plain-text tools but didn't find anything
I looked at the Tcl wiki and found an example, but could not get it to work (I searched the text in PDF, but could not find it).
I look pdf-parser.pyfrom Didier Stevens
I heard about a Python package called pyPdf and will look at it further.

Update

I found a command line tool called pdftotext written by Glyph and Cog, LLC; Built and packaged by Carsten Bluem . This tool is straightforward and it solves my problem. I am still looking at those tools that can directly search for PDF, without having to convert to a text file.

+5

python parsing pdf tcl macos

Hai vu Jul 19 '12 at 10:51

source share

1 answer

TrojanName · Answer 1 · 2012-07-19T23:19:33+0000

PyODConverter / PDF ( Java). , PDF , . , iText , .

Script to search for text from PDF

Problem

Background

What have i done so far?

Update

More articles: