4 Replies Latest reply on Jun 30, 2017 11:12 AM by Michael Horsley

    Record identifier number as image metadata?

    techhistorynerd Adventurer

      In the images available from the NARA Catalog, do the images themselves store the National Archives Identifier associated with their record anywhere in their metadata?  A quick look for EXIF or JPEG comment information comes up empty, but I readily confess I'm not enough of an expert on image metadata formats to know where the most standard/appropriate place would be to look.

       

      If the id number is not currently encoded, this would be a really useful item to add (in my opinion at least) because it would allow anyone looking at an image from the Archives in another context (so long as the metadata wasn't stripped at any rate) to use the image itself to identify the Catalog record associated with it.  That way, even if a random use of an Archives image on a random website didn't offer any useful origin information, the image itself could still be used to find the parent record.  It would also offer (in theory, again I don't know much about the practicalities) a way for those interested to find and quantify image reuse from the Archives catalogs on sites like Wikipedia, archives.org, etc. by hunting for appropriately formatted metadata in the images.

       

      As a starting point, using the open source tool Exiv2 (Exiv2 - Image metadata library and tools ) I was able to successfully embed and retrieve a JPEG comment in one of the Archive's images.  I don't know if the data should go there or in an EXIF/IPTC/XMP field somewhere, but hopefully someone who knows more than I do about that could make useful suggestions...

       

      I imagine that if a convention can be established it would be possible to systematically and automatically perform assignment of ID numbers to images using the Catalog database itself to identify information and associated images, assuming the backend software infrastructure allows for that...

       

      Is this something the Catalog team has considered?

        • Re: Record identifier number as image metadata?
          Michael Horsley Newbie

          Great question Tech,

          I used to work in the catalog and my response is my opinion not official policy.

          techhistorynerd wrote:

           

          "In the images available from the NARA Catalog, do the images themselves store the National Archives Identifier associated with their record anywhere in their metadata?"

           

          No. the images in the catalog are generally access copies. You can use the api (i think) to extract the catalog information associated with the image. The National Archives Identifier (NAID) is a catalog system generated number and is not considered the archival identifier.

           

          "A quick look for EXIF or JPEG comment information comes up empty, but I readily confess I'm not enough of an expert on image metadata formats to know where the most standard/appropriate place would be to look."

           

          I am not sure what happens in the image processing/upload/injest etc. but that data (if there in the first place) gets stripped out. The catalog was not designed to be an outward facing extension of a digital repository. Its a collection of a wide variety of images added by custodial units. True, its the one place where images and their descriptions are co-located, but it is not considered an archival repository. Its an access portal.

           

          "If the id number is not currently encoded, this would be a really useful item to add (in my opinion at least) because it would allow anyone looking at an image from the Archives in another context (so long as the metadata wasn't stripped at any rate) to use the image itself to identify the Catalog record associated with it. "

           

          I agree but in practice so many systems strip out or ignore IPTC metadata. Also take a look at the API as well as our LCDRG standards and you can see that there are many steps to go through to arrive at a common practice.

           

          "That way, even if a random use of an Archives image on a random website didn't offer any useful origin information, the image itself could still be used to find the parent record.  It would also offer (in theory, again I don't know much about the practicalities) a way for those interested to find and quantify image reuse from the Archives catalogs on sites like Wikipedia, archives.org, etc. by hunting for appropriately formatted metadata in the images."

           

          Again I agree, but its a problem of scale and application of consistent standard operating proceedures. Much of the content is added by a wide variety of staff. I am sure there is an automated way to embedd the NAID or Local Identifier in a universal IPTC field but the NAID is system generated way after the image capture and processing phase, so agin the API exists to do what you want.

           

          The application of image metadata is inconsistent. If you download this 2015 image here: https://catalog.archives.gov/id/17331642 and read the IPTC info you can see production metadata. Yet this pertains to our exhibition process and is not archival description.

           

          "As a starting point, using the open source tool Exiv2 (Exiv2 - Image metadata library and tools ) I was able to successfully embed and retrieve a JPEG comment in one of the Archive's images.  I don't know if the data should go there or in an EXIF/IPTC/XMP field somewhere, but hopefully someone who knows more than I do about that could make useful suggestions..."

           

          And agin I agree. The cool thing about what we are trying to do is give you access to as much as we can. You have to remember that the catalog has been evolving since 1999, and the digital world has been evolving at a quantum pace. What you can do is have some access to our content like never before, and we have an API tool that allows you to get access to our descriptive info. The catalog is not a digital repository, nor a digital asset manager.

           

          "I imagine that if a convention can be established it would be possible to systematically and automatically perform assignment of ID numbers to images using the Catalog database itself to identify information and associated images, assuming the backend software infrastructure allows for that..."

           

          Great idea. Probably a neccessary idea. I dont know if the resources exist.

           

          "Is this something the Catalog team has considered?"

           

          Well, I have. I think the API does a great job giving you access (but it does push the work back to you). Maybe in the future the role of the citizen archivist can expand to include performing crowd sourced hacking like you describe.

           

          Anyways great questions, and I am happy to try my best to respond to further issues.

          3 of 3 people found this helpful
            • Re: Record identifier number as image metadata?
              techhistorynerd Adventurer

              Thanks for a detailed reply!  I think I may need a few "terms of art" defined, as I'm not 100% sure of their meanings...

               

              "access copies" - I've seen this notation occasionally in the catalog.  There would seem to be a connotation of lower quality (i.e. low DPI) but does it have some other meaning in this context?

               

              "archival identifier" - the NAID may not be used for cataloging at the NARA itself, but it is far and away the most succinct way I know of to direct someone to a particular record's online metadata (and image(s) if they exist.)  Is there a preferred/better way?

               

              "archival repository" vs "access portal" vs "digital asset manager" - what specific characteristics differentiate between these three types of systems?  I know many of the digital scans in the catalog aren't sufficiently high resolution to be considered archival, and I can see identifying them as such to users of the system, but there are some (for example, the Citizen Archivist produced scans from the program being run by Archives I in DC) which I understood to be considered of archival quality.  What prevents the catalog from serving as an "archival repository" and "digital asset manager" for records where data of sufficient quality exists?

               

              I agree metadata often/usually gets stripped out - I was thinking more about applications where folks wanted to preserve a quick and easy way to go from image to catalog entry.  It may be that such uses are sufficiently rare that it wouldn't be worthwhile.

              The cool thing about what we are trying to do is give you access to as much as we can. You have to remember that the catalog has been evolving since 1999, and the digital world has been evolving at a quantum pace. What you can do is have some access to our content like never before, and we have an API tool that allows you to get access to our descriptive info.

              By all means - please don't think I'm inquiring because I'm unhappy with the progress that has been made!  I know relatively little about web development, but I know enough to know that it is very very easy to toss off casual suggestions about features that would take massive work to implement on the backend (particularly for something on the scale of the NARA's holdings.)  Perhaps someday a "crowd sourced hacking" project could be set up that starts with the 2016 ARC data dump at Archival Descriptions from the National Archives Catalog | National Archives, and tries to develop things like "digital asset manager" web infrastructure using it to prove out concepts. 

               

              Thanks for your work on making the Catalog available!

                • Re: Record identifier number as image metadata?
                  Michael Horsley Newbie

                  "access copies" - I've seen this notation occasionally in the catalog.  There would seem to be a connotation of lower quality (i.e. low DPI) but does it have some other meaning in this context?

                  techhistorynerd wrote:

                   

                  Thanks for a detailed reply!  I think I may need a few "terms of art" defined, as I'm not 100% sure of their meanings...

                   

                  "access copies" - I've seen this notation occasionally in the catalog.  There would seem to be a connotation of lower quality (i.e. low DPI) but does it have some other meaning in this context?

                  It doesnt imply lower quality, just not the same as a master image. These terms are not very well defined. It also could be considered that the scan is a derivative of a analogue original. Take a look at this site: File (or Copy) Type

                   

                   

                  "archival identifier" - the NAID may not be used for cataloging at the NARA itself, but it is far and away the most succinct way I know of to direct someone to a particular record's online metadata (and image(s) if they exist.)  Is there a preferred/better way?

                   

                  I agree. I meant that the NAID is only useful in the context of the catalog. The NAID is the unique system identifier that links the archival description from our cataloging tool, with the image stored on the catalog's servers.

                   

                   

                  "archival repository" vs "access portal" vs "digital asset manager" - what specific characteristics differentiate between these three types of systems?  I know many of the digital scans in the catalog aren't sufficiently high resolution to be considered archival, and I can see identifying them as such to users of the system, but there are some (for example, the Citizen Archivist produced scans from the program being run by Archives I in DC) which I understood to be considered of archival quality.  What prevents the catalog from serving as an "archival repository" and "digital asset manager" for records where data of sufficient quality exists?

                   

                  The terms above are very complex to get into in an on-line discussion. There are many different meanings to the use of "archival". I personally stay away from defining anything "archival quality" unless it was created to a known standard and validated in a system.

                   

                  A repository is the vast enterprise level storage system used to hold master digital objects, an access portal is something like the catalog where derivative versions of digital object are presented onto the web, and a digital asset management system, is a method of tracking master and derivatives, but may not be either the repository or portal. Its very complex.

                  I agree metadata often/usually gets stripped out - I was thinking more about applications where folks wanted to preserve a quick and easy way to go from image to catalog entry.  It may be that such uses are sufficiently rare that it wouldn't be worthwhile.

                  It is a great idea. And as you can see jason's answer what the complications are. I do appreciate your insight and without input how will we know what features to develop in the future. 5-10 years ago no one could have predicted the uses of embedded metadata. I personally use Lightroom to catalog my images, and then use platforms like Flickr, that use that metadata to fill descriptive fields.  The platforms and applications change so fast, however, take a look at the API.

                   

                  By all means - please don't think I'm inquiring because I'm unhappy with the progress that has been made!  I know relatively little about web development, but I know enough to know that it is very very easy to toss off casual suggestions about features that would take massive work to implement on the backend (particularly for something on the scale of the NARA's holdings.)  Perhaps someday a "crowd sourced hacking" project could be set up that starts with the 2016 ARC data dump at Archival Descriptions from the National Archives Catalog | National Archives, and tries to develop things like "digital asset manager" web infrastructure using it to prove out concepts. 

                   

                  Thanks for your work on making the Catalog available!

                  Thank you!

                  1 of 1 people found this helpful
              • Re: Record identifier number as image metadata?
                Jason Clingerman Wayfarer

                techhistorynerd This is a great idea, and definitely something we're interested in doing. Thanks for the suggestion.

                 

                Unfortunately we're not positioned to embed the National Archives Identifier (NAID) in images at the moment. The challenge currently is that digitized images are generated well before the NAID is produced, so they can't be associated at the moment of image-creation by the individual digitizing. We'd have to develop a way to systematically embed the NAID right before they go up on the Catalog. We will definitely look into this for future development!

                2 of 2 people found this helpful