-
Re: FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS
V. Van Hyning (formerly of By the People LOC)Jun 17, 2020 10:13 AM (in response to Rodrigo Porto Bozzetti)
Thank you Rodrigo, our Library of Congress colleague who kindly agreed to share his OCR method here on History Hub for volunteers to use if they like. This process will be helpful for getting text for printed documents, rather than handwritten ones. It can really save time when you're dealing with newsprint, though as Rodrigo says and demonstrates in his examples, there are times when OCR doesn't work so well--always read through the automatically generated text to check that it accurately represents the original (including any punctuation, accents or spelling, regardless of whether this is "correct" by modern standards). You can also get images of each document by clicking on the "view on loc.gov" button in the transcription interface.
Please preserve the original line breaks as you would if you were transcribing. Make sure that words broken over two lines such as kit-
ten
are transcribed as a whole word on the first line, so "kitten", in this example.
-
Re: FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS
Veronica Beaudry Oct 15, 2020 1:26 PM (in response to Rodrigo Porto Bozzetti)I have started using this OCR method and it is awesome. Thanks so much for sharing.
-
Re: FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS
George Carpenito Oct 25, 2020 5:01 PM (in response to Rodrigo Porto Bozzetti)The OCR reader I have been using with good success on the iPad is called HRReader (free in Apple App Store). Need to search for “Tahira Ghani ocr” it doesn’t come up under HRReader. It claims to read handwriting too but I haven’t had much luck with that. But it does work well with printed text. As was stated above, you still need to proofread mostly for misspelled words before passing the document on for review.
I have come across many documents that were waiting for review that were not proofread and had to fix for minor misspellings (on instead of “of”, etc.)
BTW, when OCR’ing a multi column page, I do it one column at a time.