2 Replies Latest reply on May 14, 2021 9:55 AM by Art Chimes

    What OCR software do you recommend?

    stephen nickerson Newbie

      I download many files that are photos of typed documents.  I try to pull the text from these images so I can create a  Microsoft WORD document.  I have used Google Lens and Google Drive to extract the text with only fair results, because of the poor quality of the documents.  What OCR software do you recommend using to extract the text for the best results?

        • Re: What OCR software do you recommend?
          National Archives Catalog Scout

          I'm sorry, the National Archives can't recommend a commercial product, but maybe our community members may have some suggestions.

           

          Suzanne

          Community Manager, National Archives Catalog

          • Re: What OCR software do you recommend?
            Art Chimes Wayfarer

            Stephen,

             

            I'm a fellow volunteer transcriber who has a lot of experience with using OCR software to extract text from sometimes imperfect images. Obviously, these are my own opinions and suggestions only. I'm assuming you're on Windows, since usually users of other operating systems mention it.

             

            I usually use Abbyy FineReader. My old (2013) ver. 12 still works fine for me. Newer editions — 15 is the 2020 release — have more features but probably nothing that will help you. The latest version full price for PC is $199. You can probably find a discount, and ebay or Craigslist may have earlier versions for much less. (There is a pared-down version called Sprint that ships with some scanners. It's not bad for what it is, but avoid it if you can.)

             

            A free alternative is Tesseract, but there is no graphical user interface. You have to specify your instructions on the command line. It's powerful and the results are as good (or maybe nearly as good) as Abbyy. If you are unfamiliar with the command line, there is lots of online help, but there will be a learning curve. (There will be a learning curve for any new software.) Like Abbyy, Tesseract OCR supports a very large number of languages.

             

            Before you run whatever OCR program, it's important that the image be as good as possible. If the picture is out of focus, you probably can't do too much with it. But many problem images have poor contrast, are too dark, or have other defects that can be improved. You don't need Photoshop, though its cheaper and more user-friendly sibling, Photoshop Elements is a good choice. The "levels" controls (ctrl-L) and maybe sharpness is probably where you should start. Again, though, you may find the free program IrfanView is all you need. Shift-G brings up a window with slider controls for brightness, contrast, and the all-important gamma. There's no magic formula (that I've found, anyway).

             

            Good luck!

            2 people found this helpful