April 01, 2020

How to Convert PDF to Image using Python?

We can user pdf2image library in Python 3 for converting image. This library wraps pdftoppm and pdftocairo to convert PDF to an image object.

Steps:
1). Install pdf2image: We need to install it from the command line tool (we need to open the command line tool in administrator mode). Command is 'pip install pdf2image'.
2). Install Poppler: We need poppler library in order to convert pdf to image. We can download the latest library from http://blog.alivate.com.au/poppler-windows/. After downloading poppler, we need to extract to the convenient location generally under C drive. In my case the poppler library bin directory is 'C:\poppler-0.68.0\bin'. Now you need to add this bin directory (C:\poppler-0.68.0\bin) to our environment variable Path.

Convert Pdf to Image
First we need to import the pdf2image library and associated exceptions so that we will get proper error message if anything goes wrong.

To make output as png from pdf:

images = convert_from_path('secret.pdf')
for image in images:
    image.save('secret.png', 'PNG')

To make output as jpg format:

images = convert_from_path('secret.pdf')
for image in images:
    image.save('secret.jpg', 'JPEG')

If we know that our pdf file has only one image then we can use below code:

images = convert_from_bytes(open('secret.pdf', 'rb').read())
images[0].save('secret-byte.jpg', 'JPEG')

When we have multiple pages with images in pdf file then we need to save them one by one by appending some counter value to avoid overwriting the same output file.

images = convert_from_path('output1.pdf')
i = 1
for image in images:
image.save('output' + str(i) + '.jpg', 'JPEG')
i = i + 1

GIT URL: How to Convert PDF to Image using Python


If the size of the PDF is huge, you might get 'MemoryError'. To resolve this we need to convert the PDF in blocks of 10 pages each time ( 1-10,11-20 and so on) ... ). Here is an example:

GIT URL: Convert PDF to Image using Python in blocks

-K Himaanshu Shuklaa

No comments:

Post a Comment

RSSChomp Blog Directory