2 Replies Latest reply on May 15, 2021 3:43 PM by Ivayla Roleva-Peneva

    Using OCR on typed Bulgarian

    Art Chimes Wayfarer

      Here's an important Bulgarian legal document. It is in the Bulgarian language, of course, which is written in the cyrillic alphabet.




      Now, I don't know Bulgarian, but my Abbyy FineReader OCR software does recognize Bulgarian and is pretty forgiving of the sub-optimal image (scanned microform?) presented.


      I did some spot-checking, and I found few errors, though there were some cases in which the image was hard to decipher.


      My question: Is it useful to enter the OCR results as a "transcription" and, if so, how exactly to do it.


      It clearly needs further review, and ideally I think it should get proofread twice, but maybe that's asking too much.


      What I have done, as a placeholder, is prefaced the OCR results with a disclaimer:


      [[ NOTE: This preliminary transcript is the result of an OCR pass

      through Abbyy FineReader. I do not know Bulgarian, so this should

      should be carefully proofed by someone who does. However, I did

      some spot-checking and the OCR seems pretty accurate, although some

      characters are poorly reproduced.]]


      Should I "save" the page and move on? Should I "submit for review"? Maybe I should just look for something in English?



        • Re: Bulgarian
          Lauren Algee Tracker

          Hi Art,


          This is a good but tricky set of questions!


          I'll start with the easiest response -- It's a great idea to use the OCR's Bulgarian functionality since you have access to it! 


          We usually advise strongly against leaving notes in text.  However, in this specific case, as part of a transcription that will go through another pass of review, I think it probably is more helpful than harmful. I would ask that you add to it a request for the reviewer to remove the note before approving! This would hopefully be obvious to most


          Whether you save or submit depends on your confidence level. You suggest that it should be proofread twice. In that case, I recommend saving and leaving "In Progress". The transcriber who ultimately submits it as complete will be one pass of review and the volunteer who formally reviews it will be another.


          I'm interested to hear if any other volunteers have thoughts on this process, your note, or levels of review!



            • Re: Using OCR on typed Bulgarian
              Ivayla Roleva-Peneva Newbie

              Hello Art and Lauren,


              I want to help you with transcribing the Bulgarian document. I am a Bulgarian and can fix the errors that the OCR software was not able to catch.


              I am new here and I might need some guidelines on how I could contribute to the project. Do you mind pointing me in the right direction?


              Thank you,


              Ivayla Roleva-Peneva