|pauamma (pauamma) wrote in accessibility_fail,|
@ 2013-02-18 08:23 pm UTC
So earlier today, I emailed the following to one of the bitsavers.org maintainers:
Subject: Home for OCRed and proofread manuals?
So I snarfed http://bitsavers.trailing-edge.com/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf and I'm now OCRing it and proofreading the result. I read the http://bitsavers.trailing-edge.com/ bit that says: "Documents here are kept in a minimal subset of PDF format, just using it as a container for lossless Group 4 fax compression (ITU-T recommendation T.6) images. Contributions are normally post-processed by tools to put them in exactly this format, so that all of the documents here are the same and
can be burst at some point in the future when OCR technology is mature enough to do a good job of recognition." which seems to imply you're not interested in providing a subset of those documents as OCR'd images+text searchable PDFs. But since I'm going to do it anyway, I'd like to share it with others. If you can't or won't host it on your own servers, do you know of another organization that could?
Within 10 minutes, I got this answer:
I need to update that. I have been OCRing documents for several years now.
(I answered thanking him, and asking about adding an alt= to the harvesting blocker img for the email address. More when I know more.)