Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: unhashable type: 'list' where processing a pdf file #1039

Open
jerryphe88 opened this issue Sep 9, 2024 · 2 comments
Open

TypeError: unhashable type: 'list' where processing a pdf file #1039

jerryphe88 opened this issue Sep 9, 2024 · 2 comments

Comments

@jerryphe88
Copy link

TypeError: unhashable type: 'list' where processing a special pdf file:

Sorry I could not provide pdf file here as it is internal doc.

I did live debug, and the call flow info as below (other objid seems fine):

line: 384 in pdfminer/pdfinterp.py
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
stack value:
k = 'Font'
fontid = 'F220'
objid = 24
resources = {'Font': {'F151': PDFObjRef:192, 'F158': PDFObjRef:22, 'F165': PDFObjRef:23, 'F220': PDFObjRef:24, 'F222': PDFObjRef:25, 'F225': PDFObjRef:19, 'F229': PDFObjRef:26, 'F274': PDFObjRef:17, 'F296': PDFObjRef:27, 'F298': PDFObjRef:28, 'F318': PDFObjRef:15, 'F321': PDFObjRef:29, 'F363': PDFObjRef:30, 'F366': PDFObjRef:31, 'F373': PDFObjRef:32, 'F377': PDFObjRef:33, 'F378': PDFObjRef:34, 'F381': PDFObjRef:35, 'F97': PDFObjRef:14}, 'ProcSet': [/'PDF', /'ImageB', /'ImageC', /'Text'], 'Type': /'Resources', 'XObject': {'I100': PDFObjRef:56, 'I104': PDFObjRef:58, 'I108': PDFObjRef:60, 'I112': PDFObjRef:62, 'I116': PDFObjRef:64, 'I12': PDFObjRef:66, 'I120': PDFObjRef:68, 'I124': PDFObjRef:70, 'I128': PDFObjRef:72, 'I132': PDFObjRef:73, 'I136': PDFObjRef:75, 'I140': PDFObjRef:77, 'I144': PDFObjRef:79, 'I148': PDFObjRef:81, 'I152': PDFObjRef:83, 'I156': PDFObjRef:85, 'I16': PDFObjRef:87, 'I160': PDFObjRef:89, 'I164': PDFObjRef:91, ...}}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}

==>
line: 219 in pdfminer/pdfinterp.py
font = PDFTrueTypeFont(self, spec)

==>
line: 992: pdfminer/pdffont.py
init(rsrcmgr, spec)

==>
line: 956: pdfminer/pdffont.py
PDFSimpleFont.init(self,
descriptor: Mapping[str, Any],
widths: FontWidthDict,
spec: Mapping[str, Any])
stack value:
descriptor = {'Ascent': 750, 'CapHeight': 0, 'Descent': -12, 'Flags': 42, 'FontBBox': [0, -7, 2197, 750], 'FontFile2': PDFObjRef:38, 'FontName': /'3_of_9_Barcode', 'ItalicAngle': 0, 'StemV': 0, 'Type': /'FontDescriptor'}
widths = {30: 750, 31: 750, 32: 580, 33: 580, 34: 580, 35: 580, 36: 580, 37: 580, 38: 580, 39: 580, 40: 580, 41: 580, 42: 580, 43: 580, 44: 580, 45: 580, 46: 580, 47: 580, 48: 580, 49: 580, 50: 580, 51: 580, 52: 580, 53: 580, 54: 580, 55: 580, 56: 580, 57: 580, 58: 580, 59: 580, 60: 580, 61: 580, 62: 580, 63: 580, 64: 580, 65: 580, 66: 580, 67: 580, 68: 580, 69: 580, 70: 580, 71: 580, 72: 580, 73: 580, 74: 580, 75: 580, 76: 580, 77: 580, 78: 580, 79: 580, 80: 580, 81: 580, 82: 580, 83: 580, 84: 580, 85: 580, 86: 580, 87: 580, 88: 580, ...}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}

==>
line: 965: pdfminer/pdffont.py
stack value:
encoding = [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'space', /'exclam', /'universal', /'numbersign', /'existential', /'percent', /'ampersand', /'suchthat', /'parenleft', /'parenright', /'asteriskmath', /'plus', /'comma', /'minus', /'period', /'slash', /'zero', /'one', /'two', /'three', /'four', /'five', /'six', /'seven', /'eight', /'nine', /'colon', ...]
the code failed on
self.cid2unicode = EncodingDB.get_encoding(literal_name(encoding))

The stack trace is:
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/high_level.py", line 211, in extract_pages
interpreter.process_page(page)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 384, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 219, in get_font
font = PDFTrueTypeFont(self, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 1010, in init
data = self.fontfile.get_data()[:length1]
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 969, in init
self.unicode_map = FileUnicodeMap()
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/encodingdb.py", line 113, in get_encoding
if diff:
TypeError: unhashable type: 'list'

@dhdaines
Copy link
Contributor

Hmm. According to the PDF spec:

A Type 1 font’s built-in encoding shall be defined by an Encoding array that is part of the font program, not to be confused with the Encoding entry in the PDF font dictionary.

Either pdfminer has gotten the PDF font dictionary and the font program confused, or whatever piece of software created the PDF did that, because an Encoding entry in the font dictionary can only be a name or a dictionary, whereas a Type 1 font's Encoding array looks exactly like what you've got in the log (it's full of ".notdef"). Since the log you've provided is just reporting what's in the file itself, I'm inclined to think that it's the PDF software's fault (especially since it claims that this is a TrueType font!).

But of course pdfminer should be robust to these sorts of shenanigans. What software created the PDF?

@Aegdesil
Copy link

Aegdesil commented Oct 7, 2024

I am having the same issue with a similar looking file (I also cannot provide it for data sensitivity issues).
Problem seems indeed linked to the way the file got generated, I don't know which tool was used, only thing I can say is that other PDF viewing applications can render it fine so it should be possible to add a fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants