TypeError: unhashable type: 'list' where processing a pdf file #1039

jerryphe88 · 2024-09-09T20:27:50Z

TypeError: unhashable type: 'list' where processing a special pdf file:

Sorry I could not provide pdf file here as it is internal doc.

I did live debug, and the call flow info as below (other objid seems fine):

line: 384 in pdfminer/pdfinterp.py
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
stack value:
k = 'Font'
fontid = 'F220'
objid = 24
resources = {'Font': {'F151': PDFObjRef:192, 'F158': PDFObjRef:22, 'F165': PDFObjRef:23, 'F220': PDFObjRef:24, 'F222': PDFObjRef:25, 'F225': PDFObjRef:19, 'F229': PDFObjRef:26, 'F274': PDFObjRef:17, 'F296': PDFObjRef:27, 'F298': PDFObjRef:28, 'F318': PDFObjRef:15, 'F321': PDFObjRef:29, 'F363': PDFObjRef:30, 'F366': PDFObjRef:31, 'F373': PDFObjRef:32, 'F377': PDFObjRef:33, 'F378': PDFObjRef:34, 'F381': PDFObjRef:35, 'F97': PDFObjRef:14}, 'ProcSet': [/'PDF', /'ImageB', /'ImageC', /'Text'], 'Type': /'Resources', 'XObject': {'I100': PDFObjRef:56, 'I104': PDFObjRef:58, 'I108': PDFObjRef:60, 'I112': PDFObjRef:62, 'I116': PDFObjRef:64, 'I12': PDFObjRef:66, 'I120': PDFObjRef:68, 'I124': PDFObjRef:70, 'I128': PDFObjRef:72, 'I132': PDFObjRef:73, 'I136': PDFObjRef:75, 'I140': PDFObjRef:77, 'I144': PDFObjRef:79, 'I148': PDFObjRef:81, 'I152': PDFObjRef:83, 'I156': PDFObjRef:85, 'I16': PDFObjRef:87, 'I160': PDFObjRef:89, 'I164': PDFObjRef:91, ...}}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}

==>
line: 219 in pdfminer/pdfinterp.py
font = PDFTrueTypeFont(self, spec)

==>
line: 992: pdfminer/pdffont.py
init(rsrcmgr, spec)

==>
line: 956: pdfminer/pdffont.py
PDFSimpleFont.init(self,
descriptor: Mapping[str, Any],
widths: FontWidthDict,
spec: Mapping[str, Any])
stack value:
descriptor = {'Ascent': 750, 'CapHeight': 0, 'Descent': -12, 'Flags': 42, 'FontBBox': [0, -7, 2197, 750], 'FontFile2': PDFObjRef:38, 'FontName': /'3_of_9_Barcode', 'ItalicAngle': 0, 'StemV': 0, 'Type': /'FontDescriptor'}
widths = {30: 750, 31: 750, 32: 580, 33: 580, 34: 580, 35: 580, 36: 580, 37: 580, 38: 580, 39: 580, 40: 580, 41: 580, 42: 580, 43: 580, 44: 580, 45: 580, 46: 580, 47: 580, 48: 580, 49: 580, 50: 580, 51: 580, 52: 580, 53: 580, 54: 580, 55: 580, 56: 580, 57: 580, 58: 580, 59: 580, 60: 580, 61: 580, 62: 580, 63: 580, 64: 580, 65: 580, 66: 580, 67: 580, 68: 580, 69: 580, 70: 580, 71: 580, 72: 580, 73: 580, 74: 580, 75: 580, 76: 580, 77: 580, 78: 580, 79: 580, 80: 580, 81: 580, 82: 580, 83: 580, 84: 580, 85: 580, 86: 580, 87: 580, 88: 580, ...}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}

==>
line: 965: pdfminer/pdffont.py
stack value:
encoding = [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'space', /'exclam', /'universal', /'numbersign', /'existential', /'percent', /'ampersand', /'suchthat', /'parenleft', /'parenright', /'asteriskmath', /'plus', /'comma', /'minus', /'period', /'slash', /'zero', /'one', /'two', /'three', /'four', /'five', /'six', /'seven', /'eight', /'nine', /'colon', ...]
the code failed on
self.cid2unicode = EncodingDB.get_encoding(literal_name(encoding))

The stack trace is:
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/high_level.py", line 211, in extract_pages
interpreter.process_page(page)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 384, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 219, in get_font
font = PDFTrueTypeFont(self, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 1010, in init
data = self.fontfile.get_data()[:length1]
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 969, in init
self.unicode_map = FileUnicodeMap()
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/encodingdb.py", line 113, in get_encoding
if diff:
TypeError: unhashable type: 'list'

dhdaines · 2024-09-19T15:46:47Z

Hmm. According to the PDF spec:

A Type 1 font’s built-in encoding shall be defined by an Encoding array that is part of the font program, not to be confused with the Encoding entry in the PDF font dictionary.

Either pdfminer has gotten the PDF font dictionary and the font program confused, or whatever piece of software created the PDF did that, because an Encoding entry in the font dictionary can only be a name or a dictionary, whereas a Type 1 font's Encoding array looks exactly like what you've got in the log (it's full of ".notdef"). Since the log you've provided is just reporting what's in the file itself, I'm inclined to think that it's the PDF software's fault (especially since it claims that this is a TrueType font!).

But of course pdfminer should be robust to these sorts of shenanigans. What software created the PDF?

Aegdesil · 2024-10-07T20:15:00Z

I am having the same issue with a similar looking file (I also cannot provide it for data sensitivity issues).
Problem seems indeed linked to the way the file got generated, I don't know which tool was used, only thing I can say is that other PDF viewing applications can render it fine so it should be possible to add a fallback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: unhashable type: 'list' where processing a pdf file #1039

TypeError: unhashable type: 'list' where processing a pdf file #1039

jerryphe88 commented Sep 9, 2024

dhdaines commented Sep 19, 2024

Aegdesil commented Oct 7, 2024

TypeError: unhashable type: 'list' where processing a pdf file #1039

TypeError: unhashable type: 'list' where processing a pdf file #1039

Comments

jerryphe88 commented Sep 9, 2024

dhdaines commented Sep 19, 2024

Aegdesil commented Oct 7, 2024