PDF Box - Encountered Weird Text in COSString

I use pdfbox to read / replace PDF text using a standard documented way, i.e. via COSString (Tj and TJ operators). It seemed to work fine until it was tested against the following PDF file:

http://www.ocs.fas.harvard.edu/students/materials/resumes_and_cover_letters.pdf

It works fine until page 7, but later read data is in a weird form. Below are a few lines of output:

S˛˚ R˚˘˚RESUMES AND COVER LETTERSPeter J. Lee      : L Q W K U R S  0 D L O  & H Q W H U  ±  & D P E U L G J H   0 D V V D F K X V H W W V                     ±  S M O H H # I D V  K D U Y D U G  H G X  

What could be the reason for this?

Thanks Usman

+3
source share
1 answer

read / replace PDF text using a standard documented method, i.e. via COSString (Tj and TJ operators)

" ", , :

  • , Tj TJ . "" . .

  • , , , . .

PDF , . , , , , .

PS: Identity-H, TimesNewRoman.

ToUnicode; , , .

, ; "I" "J" , , , . , .

, , , PDF , .

+1
source

All Articles