Feb. 18th, 2013

pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)
[personal profile] pauamma
So earlier today, I emailed the following to one of the bitsavers.org maintainers:
Subject: Home for OCRed and proofread manuals?

So I snarfed http://bitsavers.trailing-edge.com/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf and I'm now OCRing it and proofreading the result. I read the http://bitsavers.trailing-edge.com/ bit that says: "Documents here are kept in a minimal subset of PDF format, just using it as a container for lossless Group 4 fax compression (ITU-T recommendation T.6) images. Contributions are normally post-processed by tools to put them in exactly this format, so that all of the documents here are the same and
can be burst at some point in the future when OCR technology is mature enough to do a good job of recognition." which seems to imply you're not interested in providing a subset of those documents as OCR'd images+text searchable PDFs. But since I'm going to do it anyway, I'd like to share it with others. If you can't or won't host it on your own servers, do you know of another organization that could?
Within 10 minutes, I got this answer:
I need to update that. I have been OCRing documents for several years now.
(I answered thanking him, and asking about adding an alt= to the harvesting blocker img for the email address. More when I know more.)

Profile

accessibility_fail: Universal "person in wheelchair" symbol, with wheelchair user holding a cutlass (Default)
You Fail At Accessibility

May 2023

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 12th, 2025 03:51 am
Powered by Dreamwidth Studios