I experimented with pyPdf and pdfMiner to extract text from pdf files. I have some unfriendly pdf files that only pdfMiner can extract. I use the code here to extract text for the entire file. However, I would really like to extract page-based text as a function getPage(i).extractText()in pyPdf. Does anyone know how to extract text to a page using pdfMiner?
source
share