Understanding tagging

I'm new and today is my second day!  So before I get going too far please advise so I can start out as correctly as possible.  So far I'm seeing many tags being missed or incorrectly tagged like every word to do with the subject for example which are keywords while the topic and much more is not being tagged.  

I've read where you want things transcribed first and foremost of course and which brought me here with my trained background in that but seeing the need for tags or correct tags I'd like to help there more also.

Tags are placed to describe what the content is about and what it relates to.  Most I've seen do none of that.

Here are some examples of what is vital to tagging and even if these only were done to begin with would be immensely helpful to those searching:

1.  Subject, Author, Article genre:  Suffrages would be a good example or Walt Whitman.  I'm not seeing those tagged in many I've read so far.

2.  Name of poem, book, collection, etc.  Self explanatory and as it's not included sometimes in the typed content it especially should be thus tagged.

3.  Year.  When searching many times an ancestry/historian like myself wants to find what happened a certain year especially for educational purposes by Students as well.  I'm not seeing a year where it was clearly included in the text or available at beginning of a book, etc.

If there is such a list as this, please advise me of the link to it so I can include others.  Thank you!

  • Fellow Volunteer here. I think you are on the right track. In some cases where there is a famous person referred to, I tag their full name even if they use initials or just a first or last name. I also write out the date as a tag where it is not included in the header of the folder. I also see times where people have so many tags, many of which are not helpful but I leave them. I look at tags as part of a search index. That's my simple take.

    Henry

  • Hi Gail!  And welcome!

    It's also worth reiterating that tags are an experimental feature. We don't yet have a place to add them to the original materials in loc.gov. If you have a limited amount of time to spend, transcription and review are the most crucial areas of activity. That said, tags can be a useful communication tool amongst volunteers and we hope to eventually have a search feature within the site.

    If you haven't read through them yet, our tagging instructions give some broader examples:

    • If you transcribe an important word in a document, such as somebody’s name, and the original author spelled the name incorrectly, you can add a tag of the correct name using the “Tag” button.
    • Sometimes writers use nicknames or code words. If you know or can correctly identify the full name or subject using contextual information from the larger document or collection, please tag this information using the “Tag” button.
    • Are you interested in documents mentioning cats? Use the “Tag” button to tag all pages that mention cats. Other examples include “Civil War”, “Cooking”, “Sports”. You can apply whatever tags you like.
    • Keep tags as short as you can and use whole words instead of abbreviations. This will make it easier for other people to understand your tags and to reuse them on other pages.

    We don't recommend tagging any words or dates already included in the document.  As Henry notes above, it's useful to think of tags as additional search terms to those present in the text.

  • Thank you Henry for adding another, user friendly explanation as the more we do the more we all understand more about what tags mean.  I'm replying to Lauren below as tags are different here then from everywhere else online it looks like so I'm asking for clarification and I too am seeing such diverse tagging methods here. 

    If we don't get it right from the start when transcribing and editing, we won't be going back to correct?

  • Hi Lauren, 

    Sorry just replying back now but I had a very busy winter in other research projects.   I'm an  (SEO) Search Engine Optimizer where tagging is what I do and what I work with every day to optimize online sites and the articles within their pages.  Also as a Data Analyst, I provide genealogy going all the way back to ancient tribes of the earth, town historical data, as well as other research projects.  The majority of data I use is found via tagging.

    My question now is then If we don’t understand what you want to be searchable and found to begin with, as it stands we won’t be able to correct/add any tags after transcription and editing are complete?  I wouldn’t mind if you ever find the need, to go back and add tags as a specific project.

    Please correct me then as I’m not understanding as you stated it’s experimental and “We don't yet have a place to add them to the original materials in loc.gov.”  For it's only due to tagging for these documents to then be found on these sites as well as on Google, etc.  I believe the point is for them to be searchable and to be found such as on HistoryHub?   I think too it’s important to know as the documents have been typed into a site that is live online so to speak those tags already show and have effect so I don’t know how another place for them could help more.

    So here for example certain keywords i.e. names, dates,  are in the body of our transcriptions then the tags we type again make a second time possibly on a page or a first like titles and Authors that were not present to transcribe.  This is critical for search engines online to find these documents. However, if something is repeated over and over it can appear to be spam as someone trying to get their publication to rise to first page on Google and other search engines and they’ll remove it. But under a dozen times or so it is fine.

    It may be more useful to share how a Researcher like myself, Teacher, Student, etc. would search and find needed documents if the LOC has included tags:

    1. If one wanted everything about trains in 1885 they would search '1885’, ‘trains' or the like.  Google now has built in that the plural form is moot so they would include the root word data of 'train' even if you typed 'trains' so no worry about that.  They provide both results even if an older site the data was found at did not.

    It would not have shown them as easily about this had we not tagged it with the date and subject of trains.  Those who utilize Google and other search engines are given 10,000 or more pages when we search anything so without tags we are given about 8,000 pages back - just an example.  This is called an organic page as it pulls keywords out of written works that do not have tags.  But as the body of our article had train and 1885 that then helps but if it has those tags it further moves this article to one of the top results we'd be provided possibly on page 1 of results.

    1. If one wanted to find for example everything about Walt Whitman, Poet in 1885 they could then search using these tags:

      style="padding-left:45px"

    1. If I found an ancestor of his and wanted only what Walt did in 1885 for other reasons in case it may lead to where he was living at the time which I'm looking for so I'd search:

    1885, Whitman

    This would not give me only his poetry because I did not ask for poems or poetry.  It would give me lectures                 he held, papers, notes, etc. all what I need.  I could then change the year to another and search again.

    Hopefully this helps explain how important tagging is to outside researchers.  Tags target exactly what we are looking for.  The better the tags used then searched for the better the results.

    Here is a Suffrage document from one of thousands of results searching “suffrage documents” I did as now many educational sites have already found the documents we transcribed and edited:

    "Objections to Woman Suffrage Answered," by Henry Blackwell | DocsTeach

    When I checked to see when this article was last cached (searched and latest info found online) it shows this April so some of LOC’s work could already be useful.

    Now one from the LOC site when I searched the broad term ‘suffrage documents’ here are the results from work done in Oct 2020:

    Google gave me this as one of many results on page 1 instead of many pages back where I’d never find it.

    Digital Collections - 19th Amendment to the U.S. Constitution: Primary Documents in American History - Research Guides at Library of Congress (loc.gov)

    And now all of Suffrage I could click on and read!:

    About this Collection | National American Woman Suffrage ...

    https://www.loc.gov/collections/national-american...

    The Subject File includes biographical information on some of the principal suffrage workers, a collection of anti-suffrage literature, progress reports from state and local suffrage organizations affiliated with the National American Woman Suffrage Association, records relating to the work of the Congressional Union for Woman Suffrage (later the National Woman's Party), and litigation …

    These external searches (all of internet via Google and others) not internal searches (LOC only) if someone didn’t think to look in the LOC files as many won’t which make tags vital.

    *It does take search engines from about a week to a month to add new things we finish at the LOC so I wouldn’t look for an article I just edited in the last few days yet.

    Here is a Harvard University educational paper from 2005 a still early internet explanation for those who may like another but bear in mind why I've written about Google as I keep up with every update they do each year as they change the results when we search all the time so the same articles won’t come up on top time after time. You will find the Author below speaks about Yahoo being the overall source of online search but they are mistaken. It was and always will be Google as they built the internet and what Yahoo and all others use:

    Microsoft Word - 07-WhyTaggingMatters.doc (harvard.edu)

  • To Gail;

    I believe we would go back and edit the necessary items.

  • Gail:

    Thank you for your time and research.  This is very helpful.

  • For clarification, we do not currently expect tags in By the People to have a life beyond the site.  Transcriptions that are completed are returned to the Library's main collections site, loc.gov. There transcriptions immediately become searchable with loc.gov and can also be indexed by search engines to increase exposure of those collections to users searching broadly across the web.

    By the People is viewed by the Library as a pass-through application. Meaning that it is a tool for the crowdsourcing program but is not built to serve all purposes of researchers and is not intended to be the permanent home for any of the data collected there. Additionally, completed campaigns are expected to eventually be removed from BTP

    So while you're very correct that tags can assist with discovery of these materials on the open web, that is only true for as long as they are published in By the People.  Because we don't know to what extent the tags will be preserved or used by the Library after a campaign is removed from By the People, we advise volunteers to think of them as a tool within the platform and to focus their time and attention on transcriptions.

    I hope this is clarifying for all our great tagging volunteers!

  • A transcription must be accepted by at least one other user before it is marked "complete", this means it could go through several rounds of edits and corrections before being completed. If users feel they can improve upon a completed page, we ask them to email us and we are always glad to reopen a completed page in an active campaign for further edits.

  • Just found this whole project, and as the daughter of a librarian, the cousin of a librarian/city library system director, and a cataloguer at an academic specialty library one college summer, this warmed the cockles of my heart...But for reasons apparent in this thread, the tagging won't be optimized or  until there are clearer guidelines. I may have been tagging redundantly today when I tackled copyright pages, because I was mimicking the old LC-issued catalogue cards, which do repeat words that would be transcribed. (Except Tashnagzutun! - you can look it up <g>) It might be helpful if within the Project, you could allow volunteers to search for tags already in use.

    But this is a rabbithole down which I should not fall until I retire in 18 months, I'm afraid. I'll check back occasionally.

  • Yes previous Library background here also but repeat words will stop keywords from being found!

    Google algorithms (helpful ecommerce background also) will stop a company from spamming to

    be most relevant.  That translates to if we keep tagging same words - please only once! - then

    Google can drop us to the bottom of all those pages never to come up in top few pages

    when someone searches.  The bots don't know the difference between tags for a .org, .gov or .com