Use should use PDFBox and FontBox.
public String readPDFInURL() throws EmptyFileException, IOException {
WebDriver driver = new FirefoxDriver();
// page with example pdf document
driver.get("file:///C:/Users/admin/Downloads/theleader.pdf");
URL url = new URL(driver.getCurrentUrl());
InputStream is = url.openStream();
BufferedInputStream fileToParse = new BufferedInputStream(is);
PDDocument document = null;
try {
document = PDDocument.load(fileToParse);
String output = new PDFTextStripper().getText(document);
} finally {
if (document != null) {
document.close();
}
fileToParse.close();
is.close();
}
return output;
}
Some of the functions from the older versions of PDFBox have been deprecated, so we need to use another FontBox along with PDFBox. I have used PDFBox (2.0.3) and FontBox (2.0.3) and it is working fine. It won't read images though.