to_image function is slow #899
-
as you can see, for resolution=300, render a page need ~700ms on intel 13700K. a normal pdf reader(qpdfview) ls fast more |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 13 replies
-
It looks like https://github.com/jsvine/pdfplumber/blob/stable/pdfplumber/display.py#L56-L83 I have been using https://pypdfium2.readthedocs.io/en/stable/readme.html#support-model It seems to be around 3-4 times faster in single page parsing but also parses multiple pages concurrently. Basic comparison of the first 3 pages from a "complex" pdf: with codetiming.Timer():
pdf.pages[0].to_image(resolution=300)
pdf.pages[1].to_image(resolution=300)
pdf.pages[2].to_image(resolution=300)
# Elapsed time: 24.9851 seconds with codetiming.Timer():
images = list(pdf.render(
pdfium.PdfBitmap.to_pil,
page_indices = [0, 1, 2],
scale = 300/72, # 300dpi resolution
))
# Elapsed time: 4.1711 seconds This appears to be the only place wand is used in pdfplumber so it seems like it could be swapped out easily. |
Beta Was this translation helpful? Give feedback.
-
Hello, Thanks for switching to pypdfium2 (I'm the maintainer). I found this repo through GH dependents network and took a quick look at commit b049373: Kind regards |
Beta Was this translation helpful? Give feedback.
Thanks again, @cmdlineluser. Your comment motivated me to fully swap out Wand for pypdfium2 in v0.10.0, now available. As noted in b049373, it seems like a improvement on several fronts (quality, speed, installation).