2 Replies Latest reply on Oct 15, 2020 1:26 PM by Veronica Beaudry

    FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS

    Rodrigo Porto Bozzetti Newbie

      I had good results using OCR in all collections and I would like to share it, because It can save many time.

       

      Google Drive is an easy and free tool to use OCR with typed/printed texts and handwriting.

       

      Please check the steps below

       

      1 Downloading the image you want to transcribe

       

      1.1 Open By the people link that you want to transcribe

       

      1.2 Click on the 3 buttons on the upper right side

       

       

      1.3 Click in more tools

       

      1.4 Click on save page as

       

      1.5  Save the page as " Webpage complete"

       

      2 Applying Google OCR

       

      2.1 Open Gmail

       

      2.2 Open Google Drive

       

      2.3 Click in New

       

      2.4 Click in upload file

       

      2.5 Select the jpg file (It's the first) inside the folder you downloaded from "Save page as" in step 1.5

       

       

      2.6 Wait Google drive finish the upload

       

      2.7 Select the jpg file you've just uploaded

       

      2.8 Click with the right button and select open with

       

      2.9 Select open with Google Documents

       

      After that a new file with the same name and different extension will be created. This file will contain the image and the transcribed text.

       

      REVISION

       

      1. ALL TEXTS SHOULD BE REVISED

       

      The computer can make few mistakes, especially if the image has bad quality, or there are handwritten words covering the printed texts.

       

      2 THE LINES ARE NOT THE SAME

       

      The transcribed text will show different line breaks. So you should break the lines as the original text.

       

      3 SEVERAL PAGES OR COLUMNS

       

      If the image shows 2 pages, you need to split the image and put the first page on the top of the second. To split them you can use Microsoft paint.

      The same can be used in newspaper pages. Many of them have several columns, if it has 5 columns, you should split the image in 5 parts creating 5 files, then you upload the 5 files and run the OCR separately.

       

      4 HANDWRITTEN TEXTS

       

      If the handwriting is clear the OCR will transcribe all or most part of it.

       

      Here we have a good example

       

      https://crowd.loc.gov/campaigns/blackwells-extraordinary-family/henry-browne-blackwell-family-correspondence/mss12880013…

       

      Here we have a bad example

       

      https://crowd.loc.gov/campaigns/alan-lomax/british-isles-1950-1958/afc2004004.ms230215/afc2004004ms230215-14/

        • Re: FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS
          Scout

          Thank you Rodrigo, our Library of Congress colleague who kindly agreed to share his OCR method here on History Hub for volunteers to use if they like. This process will be helpful for getting text for printed documents, rather than handwritten ones. It can really save time when you're dealing with newsprint, though as Rodrigo says and demonstrates in his examples, there are times when OCR doesn't work so well--always read through the automatically generated text to check that it accurately represents the original (including any punctuation, accents or spelling, regardless of whether this is "correct" by modern standards). You can also get images of each document by clicking on the "view on loc.gov" button in the transcription interface.

           

          Please preserve the original line breaks as you would if you were transcribing. Make sure that words broken over two lines such as kit-

          ten

           

          are transcribed as a whole word on the first line, so "kitten", in this example.

          • Re: FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS
            Veronica Beaudry Wayfarer

            I have started using this OCR method and it is awesome. Thanks so much for sharing.