I use pdfbox to read / replace PDF text using a standard documented way, i.e. via COSString (Tj and TJ operators). It seemed to work fine until it was tested against the following PDF file:
http://www.ocs.fas.harvard.edu/students/materials/resumes_and_cover_letters.pdf
It works fine until page 7, but later read data is in a weird form. Below are a few lines of output:
S˛˚ R˚˘˚RESUMES AND COVER LETTERSPeter J. Lee : L Q W K U R S 0 D L O & H Q W H U ± & D P E U L G J H 0 D V V D F K X V H W W V ± S M O H H
What could be the reason for this?
Thanks Usman
source
share