Uploaded image for project: 'ERM Platform'
  1. ERM Platform
  2. ERM-1647

SPIKE: Management of title instances/identifiers in the Agreements local KB

    XMLWordPrintable

Details

    • Story
    • Status: Closed (View Workflow)
    • TBD
    • Resolution: Done
    • None
    • ERM Sprint 118, ERM Sprint 119, ERM Sprint 120
    • Bienenvolk

    Description

      Overview

      The first time a title is brought into the agreements local KB (via data import or harvest) it results in a Work and one Electronic (subtype=="Electronic") TitleInstance and optionally one Print (subtype=="Print") TitleInstance. Any Identifiers included in that initial import are associated with one of the TitleInstances (depending on whether the identifier is understood to identify the Electronic or Print form of the publication).

      Subsequent imports that match to the existing TitleInstance do not change or update the identifiers associated with either the matched TitleInstance, or any of the sibling TitleInstances (i.e. titleInstances linked to the same Work)

      This means that after the initial import there is no way to:

      • Add additional identifiers to an existing TitleInstance
      • Correct a situation where an identifier is associated with the titleInstance with the wrong subtype (Electronic vs Print)

      Example

      Two full examples encountered in real world data where the incorrect use of identifiers in a data source leads to bad data in the local KB are written up at 

      https://docs.google.com/document/d/1pLH5TYzHHcsjtlb5dn98vHkiJrXZZYIIQGGQAsCjdEs/edit?usp=sharing 

      A brief, simple, example of a problem of this type might be:

      Initial package data only has ISSN for print TitleInstance:

      • Title: 4OR : A Quarterly Journal of Operations Research
      • (p)ISSN: 1619-4500

      Result is:

      • Work created for "4OR : A Quarterly Journal of Operations Research"
      • titleInstance with subtype == "Electronic" created and linked to Work + associated PTI and PCI to express the title on a specific platform and in a specific package
      • titleInstance with subtype == "Print" created and linked to Work
      • Identifier "1619-4500" created in the "issn" namespace and linked to titleInstance with subtype == "Print"

      Package data later imported again with the original ISSN and an additional identifier of ISSN for electronic version:

      • Title: 4OR : A Quarterly Journal of Operations Research
      • (p)ISSN: 1619-4500
      • (e)ISSN: 1614-2411

      Result is:

      • Match found based on "1619-4500", but no titleInstances updated
      • PCI updated (if necessary)

      Package data later imported again with the ISSN for the electronic version only

      • Title: 4OR : A Quarterly Journal of Operations Research
      • (e)ISSN: 1614-2411

      Result is:

      • No match found based on "1614-2411", but no titleInstances updated
      • Additional Work created for "4OR : A Quarterly Journal of Operations Research"
      • titleInstance with subtype == "Electronic" created and linked to Work + associated PTI and PCI to express the title on a specific platform and in a specific package

      Related Challenges:

      Not all sources for title data should be trusted in terms of the identifiers they assert for a resource

      Sometimes the identifier may be asserted as identifying the print version when it actually identifies the electronic version (or vice versa). This leads to the information being incorrectly represented in local knowledgebase, although in terms of data import this will not normally matter (as long as the identifier is present in the correct namespace, it will still be matched on import and the correct work/titleInstance will be used by the incoming data).

      If the same title is imported from external data sources using mutually exclusive identifiers, this can result in a single resource being represented by multiple works in the local KB

      Once we have a situation where a single title instance has been created where multiple title instances have been created, then "splitting" this title instance is challenging, as it is not possible to know which PTI/PCIs should be split out with which TI/Work

      Possible approaches:

      • Implement mechanism to import (via at least file upload and potentially via harvest or API) title lists with identifiers from authoritative sources to pre-empt issues on data import from less reliable or non-authoritative data sources. Title imports should:
        • Support creation of works and title instance as necessary
        • Support addition of title instances to existing works as necessary
        • Support addition of identifiers to existing title instances 
        • Support correction of print vs electronic identifiers if they are already stored in the KB
        • NOT support changes to, or removal of, existing identifiers for title instances

      The underlying issue here seems to be the immense difficulty of identifying an authoritative source - no matter how good the source, we can't assume there will be no errors

      • Implement mechanism to switch existing identifiers between existing sibling title instances (title instances for the same Work) from the titleInstance page in the UI (i.e. fix situations where electronic and print identifiers have been reversed on initial titleInstance creation)

      This could be effective where all the necessary works already exist in the local KB?

      • Possible to add an identifier to an existing title instance via an Action from the titleInstance page in the UI where Identifier does not already exist in KB

      Seems relatively straightforward and plausible option? But requires the user to take pre-emptive action to stop future issues which isn't ideal

       

       

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                ostephens Owen Stephens
                ostephens Owen Stephens
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases