1 Reply Latest reply on Aug 20, 2019 4:56 PM by Lauren Algee

    This issue of Lippincott's Monthly Magazine was already scanned with OCR by Google (using already existing scans with OCR)

    David Friedman Wayfarer

      I just worked on a piece in the March 1891 issue of Lippincott's Monthly Magazine called "Cosmopolitanism and Culture":

       

      By the People: March 1891 Lippincott's Monthly Magazine Table of Contents

      By the People: "Cosmopolitanism and Culture"

       

      I was curious about this reference to "free speech of the soul" by Goethe, and so I did a Google search for it, and I found that all the articles in that March 1891 issue have already been scanned by Google:

       

      https://www.google.com/search?q=%22free+speech+of+the+soul%22+Goethe&ie=utf-8&oe=utf-8&client=firefox-b-1-e

       

      One can get: "The Sound of a Voice", "Some Familiar Letters by Horace Greeley",  "A Mysterious Case", etc.

       

      One has the advertisements in the back although the other two do not.

       

      1. The Sound of a Voice; Or, The Song of the Débardeur -- This one has the advertisements in the back (going with the "With the Wits" piece)

      2. Lippincott's Monthly Magazine VOL. XLVII January to June 1891  -- (from University of California)

      3. Lippincott's Monthly Magazine VOL. XLVII January to June 1891  -- (from University of Chicago)

       

      So from any of these three listings it's possible to get plain text (one can go to the gear icon and pick plain text).

       

      The plain text may certainly have errors in it, but this could be faster, and easier. One could just copy and paste the plain text and then go through and check for errors.

       

      For any particular piece of printed material one could do a Google search to see if that text has already been scanned with OCR.

       

      If it isn't in Google Books there's a possibility it might still be on Internet Archive.

       

      My thoughts are that this could be another technique along with other methods mentioned here on this thread:

       

      Re: Dictation using smartphone

       

      Where it's mentioned that OCR is generally used though for some materials it isn't ("We get a lot of questions about why we don't just use Optical Character Recognition on everything typed...").

       

      However I'm not sure if there might be any legal issues associated with copying the plain text that Google created with their processes and their technology (even if the underlying work itself is no longer under copyright).

       

      I would suspect there probably wouldn't be any actionable legal issue, but I'm not totally sure on that.

        • Re: This issue of Lippincott's Monthly Magazine was already scanned with OCR by Google (using already existing scans with OCR)
          Lauren Algee Scout

          Thanks for this finding and suggestion, David!  I agree that this is a good strategy for the printed publications you'll find on By the People. And I really appreciate your thoroughly researched and reasoned evaluation!

           

          As I noted regarding the dictation post, printed materials appear throughout the collections and many may be amenable to OCR. However, the determination of whether or not to OCR materials is made in bulk rather than at the item or page level, which is why OCR has generally not been applied to these materials within largely handwritten collections. We're working with Library of Congress digital specialist colleagues to determine if an alternative workflow could be created in the future, but either way, transcribing the printed documents allows us to create search for these pages now! Strategies of how to do that more efficiently are always welcomed!