Table Extraction Adds 0 to cell #1203

flickp · 2024-09-12T15:12:01Z

flickp
Sep 12, 2024

I have a pdf that can be found here that when I extract a table on page 16, an extra 0 is added to a cell in the last row of the table. In the pdf there is no 0 at this location.

pdf_file = "PDFs\\WorkersCompAnnualReport2011.pdf"
pdf = pdfplumber.open(pdf_file)
page = pdf.pages[16]
im = page.to_image()
im.debug_tablefinder()

When I extract the table, there is an extra column with a 0 in it that is not present in the table:

# Extract the table from the current page
# snap_tolerance needed to combine the newline-separated years in the header
table = page.extract_table(table_settings={'snap_tolerance': 6})
table[32]

Any advice on what may be happening?
Thanks

jsvine · 2024-10-03T01:27:15Z

jsvine
Oct 3, 2024
Maintainer

Interesting! Doing this:

im = page.to_image(resolution=150)
im.reset().draw_rects([ c for c in page.chars if c["text"] == "0" ])

... produces this:

Looking toward the bottom of the page, there does seem to be an extra 0 on the page that isn't otherwise visible. Characters can appear non-visible for various reasons, such as being the same color as the background (not the issue in this case), being overdrawn by another graphical element, or masking.

0 replies

flickp · 2024-10-03T03:54:19Z

flickp
Oct 3, 2024
Author

Ah, did not see that! Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Jeremy Singer-Vine ***@***.***> Sent: Thursday, October 3, 2024 10:27:37 AM To: jsvine/pdfplumber ***@***.***> Cc: flickp ***@***.***>; Author ***@***.***> Subject: Re: [jsvine/pdfplumber] Table Extraction Adds 0 to cell (Discussion #1203) Interesting! Doing this: im = page.to_image(resolution=150) im.reset().draw_rects([ c for c in page.chars if c["text"] == "0" ]) ... produces this: image.png (view on web)<https://github.com/user-attachments/assets/6a3104f2-5c0f-4ee6-849f-032ecb41b77f> Looking toward the bottom of the page, there does seem to be an extra 0 on the page that isn't otherwise visible. Characters can appear non-visible for various reasons, such as being the same color as the background (not the issue in this case), being overdrawn by another graphical element, or masking. — Reply to this email directly, view it on GitHub<#1203 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AR7IDOUE2FZAJHJUIX6Z7Y3ZZSMQTAVCNFSM6AAAAABODPZXQGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAOBSGY3DCNY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table Extraction Adds 0 to cell #1203

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Table Extraction Adds 0 to cell #1203

flickp Sep 12, 2024

Replies: 2 comments

jsvine Oct 3, 2024 Maintainer

flickp Oct 3, 2024 Author

flickp
Sep 12, 2024

jsvine
Oct 3, 2024
Maintainer

flickp
Oct 3, 2024
Author