Details
-
Story
-
Status: Closed (View Workflow)
-
TBD
-
Resolution: Done
-
None
-
-
ERM Sprint 120, ERM Sprint 123, ERM Sprint 124, ERM Sprint 125
-
Bienenvolk
-
Not Scheduled
Description
Currently title identifiers are treated as the authority on whether an incoming title to the local KB matches an existing title instance in the KB. If the incoming title shares an identifier (from a nominated list of primary identifiers) with an existing title instance, the incoming title is treated as matching the existing title instance. However, this approach fails when a data source (such as GOKb or EZB or Publisher title list) uses the same identifier (often ISSN) for two different titles (typically these titles are related, but not the same)
This spike is to test the approach of "title first" matching where the title string is used as the primary match point.
This change should initially be done in a separate branch for testing on a separate Folio installation (see ERM-1797)
The title string comparison rules should be as follows:
- title string comparison should ignore case, leading/trailing whitespace, and any cases of repeated whitespace characters should be treated as single whitespace character (i.e. " The journal of LIBRARIES " would match "the journal of libraries" or "The Journal of Libraries" etc.)
Title Instance matching should work as follows:
- 1. compare incoming title string against the title strings of existing title instances with subtype == electronic to see if a match can be found
- 1a. if unique match, then match to the title instance
- 1b. if multiple matches are found, check if by doing a secondary match on identifiers in class_one_namespaces (zdb, isbn, issn, eissn, doi) if a unique match to an existing title instance can be made and if so, match to the title instance
- 1c. if multiple matches are found even after the use of identifiers, don't match the title and return an error "unable to uniquely match title {title string} with identifiers {list of identifiers used in match}"
- 1d. if a single match has been made, any identifiers in the incoming data should be added to the existing title instances (print identifiers -> print, electronic identifiers -> electronic. Assume identifiers are for the electronic instance if no other information available) UNLESS those identifiers are already assigned to an existing title instance, in which case generate a warning (for the Info log) "Identifier {identifier value} not assigned to {matched title instance} as it is already assigned to title {existing title instance}"
- 2. if the incoming title string does not match to any existing title instance title string then:
- 2a. create a new title instance with subtype == electronic and assign any identifiers for the electronic version UNLESS those identifiers are already assigned to an existing title instance, in which case generate a warning (for the Info log) "Identifier {identifier value} not assigned to {new title instance} as it is already assigned to title {existing title instance}"
- 2b. if there are identifiers for the print version available, create a sibling title instance with subtype == print and assign any identifiers for the print version UNLESS those identifiers are already assigned to an existing title instance, in which case generate a warning (for the Info log) "Identifier {identifier value} not assigned to {new title instance} as it is already assigned to title {existing title instance}"
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-3339 Experiment with different title match process for Agreements KB
-
- Closed
-
- relates to
-
ERM-1797 Setup testing environment for Local KB data loading changes
-
- Closed
-
-
ERM-1647 SPIKE: Management of title instances/identifiers in the Agreements local KB
-
- Closed
-
-
ERM-1751 "Acta Physica Hungarica" (print ISSN: 0231-4428) and "Hungarica Acta Physica" (print ISSN: 0367-6382) match to a single work
-
- Draft
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
1.
|
Integration Test tweaks |
|
Closed | Unassigned |