What OCR software do you recommend?

I download many files that are photos of typed documents.  I try to pull the text from these images so I can create a  Microsoft WORD document.  I have used Google Lens and Google Drive to extract the text with only fair results, because of the poor quality of the documents.  What OCR software do you recommend using to extract the text for the best results?

  • I'm sorry, the National Archives can't recommend a commercial product, but maybe our community members may have some suggestions.

    Suzanne

    Community Manager, National Archives Catalog

  • Stephen,

    I'm a fellow volunteer transcriber who has a lot of experience with using OCR software to extract text from sometimes imperfect images. Obviously, these are my own opinions and suggestions only. I'm assuming you're on Windows, since usually users of other operating systems mention it.

    I usually use Abbyy FineReader. My old (2013) ver. 12 still works fine for me. Newer editions — 15 is the 2020 release — have more features but probably nothing that will help you. The latest version full price for PC is $199. You can probably find a discount, and ebay or Craigslist may have earlier versions for much less. (There is a pared-down version called Sprint that ships with some scanners. It's not bad for what it is, but avoid it if you can.)

    A free alternative is Tesseract, but there is no graphical user interface. You have to specify your instructions on the command line. It's powerful and the results are as good (or maybe nearly as good) as Abbyy. If you are unfamiliar with the command line, there is lots of online help, but there will be a learning curve. (There will be a learning curve for any new software.) Like Abbyy, Tesseract OCR supports a very large number of languages.

    Before you run whatever OCR program, it's important that the image be as good as possible. If the picture is out of focus, you probably can't do too much with it. But many problem images have poor contrast, are too dark, or have other defects that can be improved. You don't need Photoshop, though its cheaper and more user-friendly sibling, Photoshop Elements is a good choice. The "levels" controls (ctrl-L) and maybe sharpness is probably where you should start. Again, though, you may find the free program IrfanView is all you need. Shift-G brings up a window with slider controls for brightness, contrast, and the all-important gamma. There's no magic formula (that I've found, anyway).

    Good luck!

  • I have used Adobe Pro and it is easy to use.  I highly recommend it.

  • A few tips have also been shared in a related conversation in the CROWD community, which might be of interest:

    FREE OCR - ALL COLLECTIONS -TYPED AND HANDWRITTEN TEXTS

    Hope this is helpful!

  • I have used ABBYY Screenshot Reader since 2015, when I started using it for my genealogy research to copy obituaries & other newspaper articles. The last 2 times I updated it to the latest version, the price was about $10 and it was good for 2 years. At the end of the two years, I had to upgrade again; otherwise, it would stop working. I just now looked at their website & it no longer says that so I don't know what their policy is now. The price could be $10 per year--still very worthwhile.

    ABBYY Screenshot Reader is easy to use and rarely needs editing if the image you copy is very clear. If it isn't, you might have to do a little--or a lot--of editing. I don't mind that if if doesn't require a lot of editing. My eyesight is not very good now because of cataracts that were removed almost 2 years ago but are growing back (a rare condition, according to my optometrist).

    Here's the link for information about ABBYY Screenshot Reader v. 11, the version I'm using. You can try it for free for 15 days. I highly recommend it if you can't afford the more expensive OCR software.

    https://pdf.abbyy.com/screenshot-reader/

  • If you're using Windows, Microsoft OneNote has advanced OCR functionality, which works on both pictures and handwritten notes.

    OneNote supports Optical Character Recognition (OCR), a tool that lets you copy text from a picture or file printout and paste it in your notes so you can make changes to the words. It’s a great way to do things like copy info from a business card you’ve scanned into OneNote. After you extract the text, you can paste it somewhere else in OneNote or in another program, like Outlook or Word.