Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error regarding ‘evenodd’ value of LTCurve object #1024

Open
KaboChow opened this issue Jul 19, 2024 · 4 comments
Open

Error regarding ‘evenodd’ value of LTCurve object #1024

KaboChow opened this issue Jul 19, 2024 · 4 comments

Comments

@KaboChow
Copy link

Hello everyone!I found a problem regarding the 'evenodd' value of the object
image
When I try to get the data of this porous shape, the 'evenodd' values ​​obtained are all true
image
This is the PDF I used for testing:
Spin-City-Letters-6fae9bb1b9a6b3dd0f5811b066e9ed8e (1).pdf
When I use letters or numbers to convert shapes, the data recognized is correct, and the value of 'evenodd' is false.
But when using a custom shape, the recognized values ​​of 'evenodd' are all true.
Can anyone solve this problem?Thanks!

@dhdaines
Copy link
Contributor

dhdaines commented Jul 31, 2024

What is the expected behaviour here? If the shape was painted with the even-odd rule in the PDF, then evenodd will be set on all of its subpaths. This seems reasonable, no? If you want to know what regions are filled then you have to apply the rule.

(It looks you are actually using pdfplumber, not pdfminer.six directly, but the evenodd attribute is coming directly from pdfminer.six)

@dhdaines
Copy link
Contributor

dhdaines commented Jul 31, 2024

On further investigation it appears that this is related to jsvine/pdfplumber#1057, which is related to #861 and #963. I'm still not quite sure what the expected behaviour should be, though.

I think the issue is that you have one path (the porous shape above) with a lot of subpaths, which has been drawn with the f*, b* or B* operator, and that pdfminer.six has split this path into a bunch of separate LTCurve shapes, which makes it impossible for you to know which ones are filled and which ones are not?

The problem here wouldn't be evenodd as that attribute only refers to whether the even-odd rule is applied to fill the shape. I think you want to know which of the LTCurve shapes are filled and which ones aren't? In this case the expected behaviour would be for pdfminer.six to set the fill attribute on those shapes.

Is this correct?

@KaboChow
Copy link
Author

KaboChow commented Aug 1, 2024

Hello @dhdaines, your point is correct, I fell into a misunderstanding before, the "evenodd" property can only be used to distinguish between odd and non-zero wrap rules, and in fact cannot tell whether the LTCurve shape is a hole or not.
The porous shape in the example is actually a full path, but it's split into multiple LTCurve shapes for rectangular detection, which I guess is what caused the problem.
As a solution to this problem, I cleared the rule of splitting LTCurve shapes, and while it doesn't seem like a good idea, the lack of rectangular detection doesn't affect me much and the problem that bothered me is solved

@dhdaines
Copy link
Contributor

dhdaines commented Aug 1, 2024

The porous shape in the example is actually a full path, but it's split into multiple LTCurve shapes for rectangular detection, which I guess is what caused the problem.

Thanks! That's kind of what I thought - your misunderstanding of evenodd is perfectly understandable, in fact it isn't useful at all when the shapes are split since there's no way to apply the fill rule. So this should still be considered a bug. My thinking on this would be either:

  1. Don't split complex paths into multiple LTCurve objects, and keep the evenodd attribute, letting the user apply the rule (non-zero winding or even-odd) to determine the filled areas.
  2. Continue splitting complex paths, and apply the rule in pdfminer.six, setting the fill attribute on the filled subpaths. Possibly remove the evenodd attribute since it is meaningless without knowing all the subpaths.
  3. Extend the pdfminer.layout API to include the concept of complex paths, or somehow expose the fact that an LTCurve is part of a larger path. Again, the user will then have to apply the fill rule.

I think @jsvine might need to weigh in on this since I think he contributed the code in question?

Probably the simplest to implement would be (1) or (3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants