PDF Box - Encountered Weird Text in COSString

Question

PDF Box - Encountered Weird Text in COSString

I use pdfbox to read / replace PDF text using a standard documented way, i.e. via COSString (Tj and TJ operators). It seemed to work fine until it was tested against the following PDF file:

http://www.ocs.fas.harvard.edu/students/materials/resumes_and_cover_letters.pdf

It works fine until page 7, but later read data is in a weird form. Below are a few lines of output:

S˛˚ R˚˘˚RESUMES AND COVER LETTERSPeter J. Lee      : L Q W K U R S  0 D L O  & H Q W H U  ±  & D P E U L G J H   0 D V V D F K X V H W W V                     ±  S M O H H # I D V  K D U Y D U G  H G X

What could be the reason for this?

Thanks Usman

+3

java pdfbox

Usman naeem Feb 21 '14 at 2:46

source share

1 answer

mkl · Accepted Answer · 2014-02-21T05:19:06+0000

read / replace PDF text using a standard documented method, i.e. via COSString (Tj and TJ operators)

" ", , :

, Tj TJ . "" . .
, , , . .

PDF , . , , , , .

PS: Identity-H, TimesNewRoman.

ToUnicode; , , .

, ; "I" "J" , , , . , .

, , , PDF , .

PDF Box - Encountered Weird Text in COSString

More articles: