Potential use of AI for transcription

Based on all I have read recently, it would seem that transcription would be an ideal application for AI with CA’s proofreading and correcting AI generated drafts..there is a significant volume of transcription to train an AI bot, such as Chat GTP, and this could significantly expand our efforts.

obviously, many caveats on AI would have to be factored in.

Thoughts CAs?

NA staff..are you already exploring this?  Is there a pilot in the works?  Would love to volunteer to participate!

Parents
  • Thank you for your question.  Yes, we're always looking at new technology and how we can incorporate it into the National Archives Catalog.  Be sure to subscribe to the National Archives Catalog Newsletter - https://www.archives.gov/research/catalog/newsletter  This is where we will announce any new projects or features.

    Community Manager, National Archives Catalog

  • I am sure there are several technical developers/researchers out there that have solutions for these types of issues, or like myself, have created or implemented OCR and AI augmented transcription programs and platforms. However, the best of these solutions may only ever get you 95% of the way to a perfect transcription.  I have millions of transcriptions done in this way against records I would like to add via my API access, but I am uncomfortable doing so without the ability to mark the transcriptions as a draft or as "needs reviewed" to get human eyes on it.  I honestly do not know if that is a feature already, but if it is not, is there a way to make feature requests to add to your backlog?  

Reply
  • I am sure there are several technical developers/researchers out there that have solutions for these types of issues, or like myself, have created or implemented OCR and AI augmented transcription programs and platforms. However, the best of these solutions may only ever get you 95% of the way to a perfect transcription.  I have millions of transcriptions done in this way against records I would like to add via my API access, but I am uncomfortable doing so without the ability to mark the transcriptions as a draft or as "needs reviewed" to get human eyes on it.  I honestly do not know if that is a feature already, but if it is not, is there a way to make feature requests to add to your backlog?  

Children
No Data