Uploaded image for project: 'ERM Platform'
  1. ERM Platform
  2. ERM-1798

SPIKE: Implement "title first" matching on loading data into the local KB

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Closed (View Workflow)
    • Priority: TBD
    • Resolution: Done
    • Component/s: None
    • Labels:
    • Template:
    • Sprint:
      ERM Sprint 120, ERM Sprint 123, ERM Sprint 124, ERM Sprint 125
    • Development Team:
      ERM
    • Release:
      Not Scheduled

      Description

      Currently title identifiers are treated as the authority on whether an incoming title to the local KB matches an existing title instance in the KB. If the incoming title shares an identifier (from a nominated list of primary identifiers) with an existing title instance, the incoming title is treated as matching the existing title instance. However, this approach fails when a data source (such as GOKb or EZB or Publisher title list) uses the same identifier (often ISSN) for two different titles (typically these titles are related, but not the same)

      This spike is to test the approach of "title first" matching where the title string is used as  the primary match point.

      This change should initially be done in a separate branch for testing on a separate Folio installation (see ERM-1797)

      The title string comparison rules should be as follows:

      • title string comparison should ignore case, leading/trailing whitespace, and any cases of repeated whitespace characters should be treated as single whitespace character (i.e. " The     journal of LIBRARIES " would match "the journal of libraries" or "The Journal of Libraries" etc.)

      Title Instance matching should work as follows:

      • 1. compare incoming title string against the title strings of existing title instances with subtype == electronic to see if a match can be found
        • 1a. if unique match, then match to the title instance
        • 1b. if multiple matches are found, check if by doing a secondary match on identifiers in class_one_namespaces (zdb, isbn, issn, eissn, doi) if a unique match to an existing title instance can be made and if so, match to the title instance
        • 1c. if multiple matches are found even after the use of identifiers, don't match the title and return an error "unable to uniquely match title {title string} with identifiers {list of identifiers used in match}"
        • 1d. if a single match has been made, any identifiers in the incoming data should be added to the existing title instances (print identifiers -> print, electronic identifiers -> electronic. Assume identifiers are for the electronic instance if no other information available) UNLESS those identifiers are already assigned to an existing title instance, in which case generate a warning (for the Info log) "Identifier {identifier value} not assigned to {matched title instance} as it is already assigned to title {existing title instance}"
      • 2. if the incoming title string does not match to any existing title instance title string then:
        • 2a. create a new title instance with subtype == electronic and assign any identifiers for the electronic version UNLESS those identifiers are already assigned to an existing title instance, in which case generate a warning (for the Info log) "Identifier {identifier value} not assigned to {new title instance} as it is already assigned to title {existing title instance}"
        • 2b. if there are identifiers for the print version available, create a sibling title instance with subtype == print and assign any identifiers for the print version UNLESS those identifiers are already assigned to an existing title instance, in which case generate a warning (for the Info log) "Identifier {identifier value} not assigned to {new title instance} as it is already assigned to title {existing title instance}"

        TestRail: Results

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                ostephens Owen Stephens
                Reporter:
                ostephens Owen Stephens
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    TestRail: Runs

                      TestRail: Cases