You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The stack trace is:
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/high_level.py", line 211, in extract_pages
interpreter.process_page(page)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 384, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 219, in get_font
font = PDFTrueTypeFont(self, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 1010, in init
data = self.fontfile.get_data()[:length1]
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 969, in init
self.unicode_map = FileUnicodeMap()
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/encodingdb.py", line 113, in get_encoding
if diff:
TypeError: unhashable type: 'list'
The text was updated successfully, but these errors were encountered:
A Type 1 font’s built-in encoding shall be defined by an Encoding array that is part of the font program, not to be confused with the Encoding entry in the PDF font dictionary.
Either pdfminer has gotten the PDF font dictionary and the font program confused, or whatever piece of software created the PDF did that, because an Encoding entry in the font dictionary can only be a name or a dictionary, whereas a Type 1 font's Encoding array looks exactly like what you've got in the log (it's full of ".notdef"). Since the log you've provided is just reporting what's in the file itself, I'm inclined to think that it's the PDF software's fault (especially since it claims that this is a TrueType font!).
But of course pdfminer should be robust to these sorts of shenanigans. What software created the PDF?
I am having the same issue with a similar looking file (I also cannot provide it for data sensitivity issues).
Problem seems indeed linked to the way the file got generated, I don't know which tool was used, only thing I can say is that other PDF viewing applications can render it fine so it should be possible to add a fallback.
TypeError: unhashable type: 'list' where processing a special pdf file:
Sorry I could not provide pdf file here as it is internal doc.
I did live debug, and the call flow info as below (other objid seems fine):
line: 384 in pdfminer/pdfinterp.py
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
stack value:
k = 'Font'
fontid = 'F220'
objid = 24
resources = {'Font': {'F151': PDFObjRef:192, 'F158': PDFObjRef:22, 'F165': PDFObjRef:23, 'F220': PDFObjRef:24, 'F222': PDFObjRef:25, 'F225': PDFObjRef:19, 'F229': PDFObjRef:26, 'F274': PDFObjRef:17, 'F296': PDFObjRef:27, 'F298': PDFObjRef:28, 'F318': PDFObjRef:15, 'F321': PDFObjRef:29, 'F363': PDFObjRef:30, 'F366': PDFObjRef:31, 'F373': PDFObjRef:32, 'F377': PDFObjRef:33, 'F378': PDFObjRef:34, 'F381': PDFObjRef:35, 'F97': PDFObjRef:14}, 'ProcSet': [/'PDF', /'ImageB', /'ImageC', /'Text'], 'Type': /'Resources', 'XObject': {'I100': PDFObjRef:56, 'I104': PDFObjRef:58, 'I108': PDFObjRef:60, 'I112': PDFObjRef:62, 'I116': PDFObjRef:64, 'I12': PDFObjRef:66, 'I120': PDFObjRef:68, 'I124': PDFObjRef:70, 'I128': PDFObjRef:72, 'I132': PDFObjRef:73, 'I136': PDFObjRef:75, 'I140': PDFObjRef:77, 'I144': PDFObjRef:79, 'I148': PDFObjRef:81, 'I152': PDFObjRef:83, 'I156': PDFObjRef:85, 'I16': PDFObjRef:87, 'I160': PDFObjRef:89, 'I164': PDFObjRef:91, ...}}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}
==>
line: 219 in pdfminer/pdfinterp.py
font = PDFTrueTypeFont(self, spec)
==>
line: 992: pdfminer/pdffont.py
init(rsrcmgr, spec)
==>
line: 956: pdfminer/pdffont.py
PDFSimpleFont.init(self,
descriptor: Mapping[str, Any],
widths: FontWidthDict,
spec: Mapping[str, Any])
stack value:
descriptor = {'Ascent': 750, 'CapHeight': 0, 'Descent': -12, 'Flags': 42, 'FontBBox': [0, -7, 2197, 750], 'FontFile2': PDFObjRef:38, 'FontName': /'3_of_9_Barcode', 'ItalicAngle': 0, 'StemV': 0, 'Type': /'FontDescriptor'}
widths = {30: 750, 31: 750, 32: 580, 33: 580, 34: 580, 35: 580, 36: 580, 37: 580, 38: 580, 39: 580, 40: 580, 41: 580, 42: 580, 43: 580, 44: 580, 45: 580, 46: 580, 47: 580, 48: 580, 49: 580, 50: 580, 51: 580, 52: 580, 53: 580, 54: 580, 55: 580, 56: 580, 57: 580, 58: 580, 59: 580, 60: 580, 61: 580, 62: 580, 63: 580, 64: 580, 65: 580, 66: 580, 67: 580, 68: 580, 69: 580, 70: 580, 71: 580, 72: 580, 73: 580, 74: 580, 75: 580, 76: 580, 77: 580, 78: 580, 79: 580, 80: 580, 81: 580, 82: 580, 83: 580, 84: 580, 85: 580, 86: 580, 87: 580, 88: 580, ...}
spec = {'BaseFont': /'3_of_9_Barcode', 'Encoding': [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', ...], 'FirstChar': 30, 'FontDescriptor': PDFObjRef:39, 'LastChar': 255, 'Subtype': /'TrueType', 'Type': /'Font', 'Widths': [750, 750, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, 580, ...]}
==>
line: 965: pdfminer/pdffont.py
stack value:
encoding = [/'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'.notdef', /'space', /'exclam', /'universal', /'numbersign', /'existential', /'percent', /'ampersand', /'suchthat', /'parenleft', /'parenright', /'asteriskmath', /'plus', /'comma', /'minus', /'period', /'slash', /'zero', /'one', /'two', /'three', /'four', /'five', /'six', /'seven', /'eight', /'nine', /'colon', ...]
the code failed on
self.cid2unicode = EncodingDB.get_encoding(literal_name(encoding))
The stack trace is:
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/high_level.py", line 211, in extract_pages
interpreter.process_page(page)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 384, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdfinterp.py", line 219, in get_font
font = PDFTrueTypeFont(self, spec)
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 1010, in init
data = self.fontfile.get_data()[:length1]
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/pdffont.py", line 969, in init
self.unicode_map = FileUnicodeMap()
File "/opt/anaconda3/envs/lc-work/lib/python3.9/site-packages/pdfminer/encodingdb.py", line 113, in get_encoding
if diff:
TypeError: unhashable type: 'list'
The text was updated successfully, but these errors were encountered: