How to use pdftotext - 10 common examples

To help you get started, we’ve selected a few pdftotext examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_read_portrait(self):
        pdf = pdftotext.PDF(get_file("portrait.pdf"))
        result = pdf[0]
        self.assertIn("a", result)
        self.assertIn("b", result)
        self.assertIn("c", result)
        self.assertIn("d", result)
github jalan / pdftotext / tests / test_sanity.py View on Github external
def test_pdf_read_all_zero_args(self):
        pdf = pdftotext.PDF(self.pdf_file)
        result = pdf.read_all()
        self.assertIn("", result)
github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_pdf_read_invalid_page_number(self):
        pdf = pdftotext.PDF(get_file("blank.pdf"))
        with self.assertRaises(IndexError):
            pdf[100]
github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_read_corrupt_page(self):
        with self.assertRaises((pdftotext.Error, IndexError)):
            pdf = pdftotext.PDF(get_file("corrupt_page.pdf"))
            pdf[0]
github jalan / pdftotext / tests / test_sanity.py View on Github external
def test_pdf_page_count(self):
        pdf = pdftotext.PDF(self.pdf_file)
        self.assertEqual(type(pdf.page_count), int)
github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_raw_invalid_type(self):
        with self.assertRaises(TypeError):
            pdftotext.PDF(get_file("blank.pdf"), raw="")
github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_list_invalid_element(self):
        pdf = pdftotext.PDF(get_file("two_page.pdf"))
        with self.assertRaises(IndexError):
            pdf[2]
github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_read_corrupt_page(self):
        with self.assertRaises((pdftotext.Error, IndexError)):
            pdf = pdftotext.PDF(get_file("corrupt_page.pdf"))
            pdf[0]
github jalan / pdftotext / tests / test_pdf.py View on Github external
def test_locked_with_both_passwords(self):
        with self.assertRaises(pdftotext.Error):
            pdftotext.PDF(get_file("both_passwords.pdf"))
github the-paperless-project / paperless / src / paperless_tesseract / parsers.py View on Github external
def get_text_from_pdf(pdf_file):

    with open(pdf_file, "rb") as f:
        try:
            pdf = pdftotext.PDF(f)
        except pdftotext.Error:
            return ""

    return "\n".join(pdf)

pdftotext

Simple PDF text extraction

MIT
Latest version published 6 days ago

Package Health Score

76 / 100
Full package analysis

Popular pdftotext functions