Skip to content

A better scan

Compare
Choose a tag to compare
@pronoiac pronoiac released this 19 Apr 05:12
· 66 commits to main since this release
49911fe

About this copy

This is a scanned copy of the 4th printing, 1998. It's shared for reading, and for improving the Markdown copy in our Github repo.

How it was made

@pronoiac had the spine / binding removed and fed the pages through a scanner. Steps and software used:

  • scanner gave 600dpi grayscale, as 3.6 gigabytes of png files
  • Scantailor Advanced (in Docker): deskew the pages and render the pages as 300dpi black and white (1-bit) tiffs - 30 megabytes
  • tiff2pdf and pdfunite: turn those many tiffs into one pdf
  • OCRmyPDF: OCR with Tesseract, add title and author to the pdf, apply lossless JBIG2 compression - 24 megabytes

Other notes

  • It’s higher resolution, though an older printing (4th printing, 1998) than the previous scan (6th printing, 2001).
  • OCR is better than the previous scan - searching for keywords or phrases usually works
  • why not the grayscale PNGs: space constraints on Github releases, and dubious value for space
  • ebooks from the Markdown version are getting closer
  • see #137 for some of the thoughts behind this release