Extracted text and long pdf documents

I've mainly been transcribing long pdf files from the Ford and Eisenhower Libraries. I generally copy and past the extracted text into a document, correct the AI transcription errors and paste this into the transcription window.

Having the extracted text has simplified transcription for the most part, but there is a significant flaw in very long documents. PDF files over 30 pages or so are not completely transcribed. An extreme example is in NAID 258225379, which has 446 page images, but the extracted text only captured the first 29 pages.

I'm not bringing this up as a complaint - I do love have the AI text to simplify the work - but as a bug in the text extraction process that you should be aware of.

Parents Reply Children