6 Replies Latest reply on Sep 18, 2019 11:12 PM by David Friedman

    Handling Accented Characters in English Documents

    David Friedman Wayfarer

      The help documentation has:

       

      "It is also our goal to preserve the original spelling, grammar, and punctuation of the documents, in order to honor the original creators' historical reality and stylistic choices."

       

      And also for the case of text other than English:

       

      "If you can transcribe the original language of a document, please do so. You can change your language input settings in your browser, and may need to use a foreign language keyboard or shortcuts for non-English characters."

       

      Accented characters can and do occur in English say in the case of this page with:

       

      "The bench frowns on coups de théâtre;"

       

      So I would think we ought to do that as coups de théâtre and not in the unaccented form as coups de theatre.

       

      But in looking at transcriptions I've been seeing the unaccented form instead.

       

      Currently this page has chateau instead of château:

       

      And I've seen that in other cases.

       

      Search systems can be made to deal with that to a certain extent, so that if somebody puts in either the accented form or the unaccented form it will search for and return results for both.

       

      But the transcription still wouldn't be as accurate a rendering of what was actually in the document, and there could still be search issues.

       

      One idea would be to talk about that in the help documentation at:

       

      How to Transcribe

       

      Another idea would be make a slight modification for the Quick Tips:

       

      to change:

       

      "Transcribe original spelling, punctuation, word order, and any page numbers or catalog marks."

       

      to:

       

      "Transcribe original spelling (including accented characters such as é and â), punctuation, word order, and any page numbers or catalog marks."

       

      So then the issue can be that people are not sure how to do characters such as é and â.

       

      But the help documentation can address that. Maybe there could be a link to the text "accented characters" in "(including accented characters such as é and â)" to further documentation.

       

      I also just did a search for "accented" and "accented characters" on https://historyhub.history.gov/community/crowd-loc in Search Existing Posts and found these two threads on that topic:

       

      Keyboard shortcuts for French accents

       

      What are best practices for transcribing non-Standard English characters?

        • Re: Handling Accented Characters in English Documents
          Tammi Bunting Newbie

          This is sort of addressed under the "How to Transcribe" page under:

          Translation and text other than English:

          If you can transcribe the original language of a document, please do so. You can change your language input settings in your browser, and may need to use a foreign language keyboard or shortcuts for non-English characters. However, do not translate documents. The Library is not currently able to support translation . If you would like to translate a document and discuss it with other volunteers, please visit History Hub and join in one of the many conversations on this topic. Please note that the project ran a short trial period of translation, but we now ask that all translations be kept out of the transcription space.

          I interpreted this as using the original characters and have looked up the alt keys for said characters when needed.

          I have come across transcriptions where the person didn't follow along with the rules on the "How to Transcribe" page in other instances beside the characters, i.e. not spelling as they see it in the original or not using [?] when they can't tell what it says. 

            • Re: Handling Accented Characters in English Documents
              David Friedman Wayfarer

              Yes, I think it would preferable for the transcriber to use the accented characters.

               

              I could add here a page that I've used to get these:

               

              https://practicaltypography.com/common-accented-characters.html

               

              Aside from the Windows or Mac keys that are listed in that URL or also on this thread:

               

              Keyboard shortcuts for French accents

               

              One could also just use copy and paste.

               

              I'd say in general for any crowdsourcing project there can be various different decisions that can involve different tradeoffs.

               

              Say a crowdsourcing project might consider setting up a sort of training or orientation process that people would go through before they began volunteering.

               

              And in that orientation process things like dealing with accented characters could be addressed, or using the original spelling, or using the [?] for illegible portions of text.

               

              The tradeoff is that even if that process took just say 15 or 20 minutes it might still lead to a sizable reduction in the number of volunteers.

               

              So I suppose part of the idea was to set it up like Wikipedia so that it would be very easy for people to quickly volunteer and begin transcribing.

               

              And then even if there were issues with work done there would be ways that those could be addressed.

               

              So, anyway, one way or another, one could consider it an experiment, and over time various policies might change based on results and priorities.

              1 of 1 people found this helpful
            • Re: Handling Accented Characters in English Documents
              Lauren Algee Adventurer

              Yes, we would like volunteers to capture accents when they are used in the original text. As David Friedman and Tammi Bunting both point out, this is consistent with our instructions, though not explicitly stated in them. Thanks to you both for pointing this out!

               

              I will chat with the other By the People community managers to evaluate if the instructions should be updated to explicitly include information on special characters and accents.

               

              I appreciate you both taking the time to think through and document this no-longer-gray area!

              • Re: Handling Accented Characters in English Documents
                David Friedman Wayfarer

                I could add on this thread that I've seen in the course of Wikipedia editing the term diacritical used:

                 

                "A diacritic – also diacritical mark, diacritical point, diacritical sign, or accent – is

                a glyph added to a letter or basic glyph...Some diacritical marks, such as the acute ( ´ )

                and grave ( ` ), are often called accents. Diacritical marks may appear above or below a

                letter, or in some other position such as within the letter or between two letters."

                 

                (from the Wikipedia article titled "Diacritic")

                 

                So for example in Wikipedia if one tries to go to "Lech Walesa" it'll redirect to "Lech Wałęsa".

                 

                If one clicks then on the link "Lech Walesa" at the top one can see a message about how the

                form without the diacritics redirects to the one with the diacritics:

                 

                https://en.wikipedia.org/w/index.php?title=Lech_Walesa&redirect=no

                 

                So, for this project, a few people might try searching first for "diacritic" before they

                search for "accent" in the posted messages on History Hub.

                • Re: Handling Accented Characters in English Documents
                  Victoria Van Hyning Scout

                  Hi everyone,

                   

                  We've written a new post about how to handle accents and languages other than English. Thanks for your interest in this topic, and help thinking through it. Accents, diacritics and languages other than English

                  1 of 1 people found this helpful