Uploaded image for project: 'data-import-processing-core'
  1. data-import-processing-core
  2. MODDICORE-238

Data Import matches on identifier type and identifier value separately, resulting in incorrect matches (Juniper HF#5)

    XMLWordPrintable

Details

    • 0
    • Folijet Support
    • R2 2021 Hot Fix #5
    • Chalmers, TAMU
    • Implementation coding issue

    Description

      Possible Juniper HF; Kiwi BF

      Original details below the double line; updated bug details above the line; see MODDATAIMP-595 for additional test cases

      Requirements:
      Requirement 1

      • When a MARC Bib-to-Instance match is defined in a match profile
      • And the Instance matchpoint is any of the Identifier options
      • Then ensure that any instance identified as a match meets both the Identifier type and Data requirements
      • Example:
        • Match profile of MARC Bib 910$a exactly matches Identifier: ASIN
        • 910$a value 12345 and Instance value 12345 with Identifier type ASIN: MATCH
        • 910$a value 12345 and Instance value 12345 with Identifier type: OCLC: NO MATCH

      Requirement 2

      • When a MARC Bib-to-Instance match is defined in a match profile
      • And the Instance matchpoint is any of the Identifier options
      • Then ensure that standard match logic is followed for single matches, multiple matches, and no matches
        • Single match: Take whatever action(s) are specified in the job profile for a match. If there is no action, STOP
        • No match: Take whatever action(s) are specified in the job profile for no-match. If there is no action, STOP
        • Multiple matches: STOP; discard the record and take no action on the instance

      Requirement 3

      • When a MARC Bib-to-Instance match is defined in a match profile
      • And the Instance matchpoint is any of the Identifier options
      • And any of these options are marked in the match profile: 1) Use a qualifier for incoming value, 2) Only compare part of the incoming value, 3) Exact/Contains/Begins/End, 4) Use a qualifier for existing value 5) Only compare part of the existing value
      • Then ensure that the appropriate match logic for each marked option is used when determining if there is a match or not

      Basic test See additional tests on MODDATAIMP-595

      1. Have Juniper bugfest and snapshot-load environments open (Juniper bugfest for current behavior, snapshot-load for corrected behavior)
      2. Go to Inventory and search for the following Identifiers. Make sure that identifier does not already exist in either environment (so that you will not encounter multiple matches)
        • ORD32671387-7
        • (AMB)84714376518561876438
        • (OCLC)84714376518561876438
        • 84714376518561876438
      3. Import the same file ID Match Test File - Create.mrc into both environments, using the Default - Create instance and SRS MARC Bib job profile
      4. Once imported view the Identifiers on the Instances created from the file
        • Title: Competing with Idiots
          • Identifier type: GPO Item number with Value ORD32671387-7
          • Identifier type: OCLC with Value (OCoLC)84714376518561876438
        • Title: Letters from a Stoic
          • Identifier type: Cancelled GPO Item number with Value ORD32671387-7
          • Identifier type: System Control Number with Value (AMB)84714376518561876438
      5. Next will be 4 matching tests, checking that the same value with different identifier types is matched properly, and that a numbers-only match with the same numeric values but different alphas and different identifier types are matched properly
      6. In Settings, create the following match profiles:
      7. Match profile 1
        • Name: ID Match Test - Update1 (Valid GPO)
        • Incoming records: MARC Bib
        • Existing records: Instance
        • Incoming record:
          • Field: 074
          • In.1: *
          • In.2: *
          • Subfield: a
          • No qualifier or Compare part of value
        • Exactly matches
        • Existing Instance record
          • Field: Identifier: GPO Item number
          • No qualifier or Compare part of value
      8. Match profile 2
        • Name: ID Match Test - Update2 (Cancelled GPO)
        • Incoming records: MARC Bib
        • Existing records: Instance
        • Incoming record:
          • Field: 074
          • In.1: *
          • In.2: *
          • Subfield: z
          • No qualifier or Compare part of value
        • Exactly matches
        • Existing Instance record
          • Field: Identifier: Cancelled GPO Item number
          • No qualifier or Compare part of value
      9. Match profile 3
        • Name: ID Match Test - Update3 (OCLC)
        • Incoming records: MARC Bib
        • Existing records: Instance
        • Incoming record:
          • Field: 035
          • In.1: *
          • In.2: *
          • Subfield: a
          • Qualifier: Begins with: (OCoLC)
          • Compare part of value: Numerics only
        • Exactly matches
        • Existing Instance record
          • Field: Identifier: OCLC
          • Qualifier: none
          • Compare part of value: Numerics only
      10. Match profile 4
        • Name: ID Match Test - Update4 (System control number)
        • Incoming records: MARC Bib
        • Existing records: Instance
        • Incoming record:
          • Field: 035
          • In.1: *
          • In.2: *
          • Subfield: a
          • Qualifier: Begins with: (AMB)
          • Compare part of value: none
        • Exactly matches
        • Existing Instance record
          • Field: Identifier: System control number
          • Qualifier: none
          • Compare part of value: none
      11. Create 4 Field mapping profiles
      12. Field mapping profile 1
        • Name: ID Match Test - Update1 (Valid GPO)
        • Incoming record type: MARC Bibliographic
        • FOLIO record type: Instance
        • Suppress from discovery: Mark for all affected records
        • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 1 December 2021 from the calendar (which will fill in "2021-12-01" for the cataloged date
        • Instance status: Click the Accepted values dropdown, and select Batch loaded
      13. Field mapping profile 2
        • Name: ID Match Test - Update2 (Cancelled GPO)
        • Incoming record type: MARC Bibliographic
        • FOLIO record type: Instance
        • Staff suppress: Mark for all affected records
        • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 2 December 2021 from the calendar (which will fill in "2021-12-02" for the cataloged date
        • Instance status" Click the Accepted values dropdown, and select Cataloged
      14. Field mapping profile 3
        • Name: ID Match Test - Update3 (OCLC)
        • Incoming record type: MARC Bibliographic
        • FOLIO record type: Instance
        • Suppress from discovery: Unmark for all affected records
        • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 3 December 2021 from the calendar (which will fill in "2021-12-03" for the cataloged date
        • Instance status" Click the Accepted values dropdown, and select Not yet assigned
      15. Field mapping profile 4
        • Name: ID Match Test - Update4 (System control number)
        • Incoming record type: MARC Bibliographic
        • FOLIO record type: Instance
        • Staff suppress: Unmark for all affected records
        • Cataloged date: Click the Accepted values dropdown, and select Choose date, then select 4 December 2021 from the calendar (which will fill in "2021-12-04" for the cataloged date
        • Instance status" Click the Accepted values dropdown, and select Other
      16. Create 4 Action profiles
      17. Action profile 1
        • Name: ID Match Test - Update1 (Valid GPO)
        • Action: Update
        • FOLIO record type: Instance
        • Link the field mapping profile of the same name
      18. Action profile 2
        • Name: ID Match Test - Update2 (Cancelled GPO)
        • Action: Update
        • FOLIO record type: Instance
        • Link the field mapping profile of the same name
      19. Action profile 3
        • Name: ID Match Test - Update3 (OCLC)
        • Action: Update
        • FOLIO record type: Instance
        • Link the field mapping profile of the same name
      20. Action profile 4
        • Name: ID Match Test - Update4 (System control number)
        • Action: Update
        • FOLIO record type: Instance
        • Link the field mapping profile of the same name
      21. Create 4 Job profiles
      22. Job profile 1
        • Name: ID Match Test - Update1 (Valid GPO)
        • Accepted data type: MARC
        • Click + and add the Match profile of the same name
        • For matches: Click plus and add the Action profile of the same name
        • For non-matches: none
      23. Job profile 2
        • Name: ID Match Test - Update2 (Cancelled GPO)
        • Accepted data type: MARC
        • Click + and add the Match profile of the same name
        • For matches: Click plus and add the Action profile of the same name
        • For non-matches: none
      24. Job profile 3
        • Name: ID Match Test - Update3 (OCLC)
        • Click + and add the Match profile of the same name
        • Accepted data type: MARC
        • For matches: Click plus and add the Action profile of the same name
        • For non-matches: none
      25. Job profile 4
        • Name: ID Match Test - Update4 (System control number)
        • Accepted data type: MARC
        • Click + and add the Match profile of the same name
        • For matches: Click plus and add the Action profile of the same name
        • For non-matches: none
      26. In sequence, import each of the following files into Kiwi-BF and folio-snapshot-load, using the specified job profile
      27. Update 1
        • File name: ID Match Test File - Update1.mrc
        • Job profile name: ID Match Test - Update1 (Valid GPO)
      28. Update 2
        • File name: ID Match Test File - Update2.mrc
        • Job profile name: ID Match Test - Update2 (Cancelled GPO)
      29. Update 3
        • File name: ID Match Test File - Update3.mrc
        • Job profile name: ID Match Test - Update3 (OCLC)
      30. Update 4
        • File name: ID Match Test File - Update4.mrc
        • Job profile name: ID Match Test - Update4 (System control number)
      31. After each import review the 2 Instances in Inventory in Kiwi BF and folio-snapshot-load
      32. What should happen if the match is success for each job profiles
      33. Update 1
        • Juniper-BF (before the fix)
          • Title: Competing with Idiots
            • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
            • No changes to the instance
          • Title: Letters from a Stoic
            • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
            • No changes to the instance
        • folio-snapshot (after the fix)
          • Title: Competing with Idiots
            • Match should have succeeded (since this instance has the same value, and ID type of GPO)
            • Check for the following changes in the Instance:
              • Marked as Suppressed from discovery
              • Cataloged date: 2021-12-01
              • Instance status: Batch Loaded
              • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 1
          • Title: Letters from a Stoic
            • Match should have failed (since this instance has the same value, but ID type of Cancelled GPO (instead of GPO)
            • No changes to the Instance
      34. Update 2
        • Juniper-BF (before the fix)
          • Title: Competing with Idiots
            • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
            • No changes to the instance
          • Title: Letters from a Stoic
            • Match should have failed (since 2 Instances have the same identifier, but different identifier types)
            • No changes to the instance
        • folio-snapshot (after the fix)
          • Title: Competing with Idiots
            • Match should have failed (since this instance has the same value, but ID type of GPO (instead of Cancelled GPO)
            • No changes to the Instance
          • Title: Letters from a Stoic
            • Match should have succeeded (since this instance has the same value, and ID type of Cancelled GPO)
            • Check for the following changes in the Instance:
              • Marked as Staff suppress
              • Cataloged date: 2021-12-02
              • Instance status: Cataloged
              • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 2
      35. Update 3
        • Kiwi-BF (before the fix)
          • Title: Competing with Idiots
            • Match may have failed due to multiple matches (since this instance has the same value, but ID type of OCLC (instead of System control number); match should be ignoring any prefix for the matching, but only attempt to match the 035 of the incoming record (according to the match profile)
            • If the match succeeded
              • Cataloged date: 2021-12-03
              • Instance status: Not yet assigned
              • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 3
          • Title: Letters from a Stoic
            • Match may have succeeded (since the match is on numerics only, and the existing record has an 035 with matching numbers, though not matching alphas or Match may have failed due to multiple matches
            • If the match succeeded
              • Cataloged date: 2021-12-03
              • Instance status: Not yet assigned
              • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 3
        • folio-snapshot (after the fix)
          • Title: Competing with Idiots
            • Match should have succeeded (since the match is based on prefix of (OCoLC) matching numerics only, and Identifier type of OCLC
            • Check the Instance for the following updates:
              • Cataloged date: 2021-12-03
              • Instance status: Not yet assigned
              • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 3
          • Title: Letters from a Stoic
            • Match should not have succeeded, since the match requires Identifier type OCLC, and there's not a number with that type in the Instance
            • No changes to the Instance
      36. Update 4
        • Kiwi-BF (before the fix)
          • Title: Competing with Idiots
            • Match probably failed
            • No changes to the Instance
          • Title: Letters from a Stoic
            • Match probably failed
            • No changes to the Instance
        • folio-snapshot (after the fix)
          • Title: Competing with Idiots
            • Match should fail (since the only System control number in the Instance does not match the alphanumeric value of the incoming record)
            • No changes to the Instance
          • Title: Letters from a Stoic
            • Match should succeed (since the Instance has a System control number that matches an 035 on the incoming record)
            • Check the Instance for the following updates:
              • Cataloged date: 2021-12-04
              • Instance status: Other
              • In the Notes accordion, there should be a new General note that begins: IDENTIFIER UPDATE 4

      ===============================================================

      Overview

      When importing records using Data Import and an import profile that matches incoming records on a specific instance identifier, FOLIO returns both

      • records which have that an identifier of the given type with the given value
        and
      • records that have one identifier identifier of the given type, and another identifier with the given value

      This results in too many matches, failed overlays and incorrect overlays.

      Steps to Reproduce

      1. Log into Bugfest Juniper
      2. Create instance A with identifiers
                   "identifiers": [
                        {
                            "identifierTypeId": "be0f28f8-5814-4b68-ace5-f1cae80a8ae0", (Libirs ID)
                            "value": "123abc"
                        }
                    ],
        

        https://bugfest-juniper.folio.ebsco.com/inventory/view/c8a07499-8e31-46ca-819f-7f84d6c51403

      1. Create instance B with identifiers
                   "identifiers": [
                        {
                            "identifierTypeId": "be0f28f8-5814-4b68-ace5-f1cae80a8ae0", (Libirs ID)
                            "value": "456def"
                        },
                        {
                            "identifierTypeId": "18a2affc-4155-46c8-ac26-db4ae64eef2e", (Sierra Bib ID)
                            "value": "123abc"
                        }
                    ]
        
        

        https://bugfest-juniper.folio.ebsco.com/inventory/view/4b428f92-7c83-47cb-94c0-9c476d30c4a5

      1. Go to data import, and import a MARC record with 001 “123abc”, using an import profile that matches incoming 001 on instance identifier of type “be0f28f8-5814-4b68-ace5-f1cae80a8ae0”

        https://bugfest-juniper.folio.ebsco.com/settings/data-import/job-profiles/view/ca77ea8c-c836-4b4c-8c38-679926702fc5?query=lisa&sort=name

      Expected Result

      instance A, which has an identifier of type be0f28f8-5814-4b68-ace5-f1cae80a8ae0 with the value “123abc”, is overlaid.

      Actual Result

      The import is “completed with errors”, and no instance is overlaid.


      The error log (thank you msuranofsky!) shows an error like this (same error, different example)

      ERROR AbstractLoader   	Found multiple records matching specified conditions. CQL query: [identifiers=""\""identifierTypeId\"":\""28c170c6-3194-4cff-bfb2-ee9525205cf7\"""" AND (identifiers=""\""value\"":\""18124354\"""")]."
      2021-11-19 12:50:06.375,Found records: [ {
      

      Additional Information

      Hypothesis and consequences

      Using a match profile that matches incoming 001 on an instance identifier of type ISBN, when I import a record with 001 "123" FOLIO will consider records a match if they fill the following criteria:

      1. the record has an identifier which is of type ISBN
      2. the record has an identifier which has value "123"

      What's noteworthy is that these two criteria do not have to be fulfilled by the _same _identifier object in the instance. The consequence of this is that FOLIO sometimes finds “false duplicates” (eg one record with ISBN 123, and another with Invalid ISBN 123) that cause the overlay to fail, and sometimes overlays the wrong record (eg a record with ISBN 789 and OCLC Number 123).

      The query syntax behind the match

      This behaviour can be more easily observed by testing the query syntax given in the error message above.

      Given a FOLIO record with these identifiers exists in BugFest Juniper (and no other record with identifier "888555"):

      "identifiers": [ { "identifierTypeId": "fcca2643-406a-482a-b760-07a7f8aec640", "value": "888555" }, { "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422", "value": "785633" } ],
      

      The following query

      {{baseUrl}}/inventory/instances?query=identifiers=""\""identifierTypeId\"":\""8261054f-be78-422d-bd51-4ed9f33c3422\"""" AND (identifiers=""\""value\"":\""888555\"""")
      

      should not return any matches. However, it returns the above instance.

      Compare this with another syntax that returns the expected result

      In contrast, the syntax for searching specific identifiers described in https://wiki.folio.org/pages/viewpage.action?pageId=33948019 returns the expected results.

      /inventory/instances?query=(identifiers= /@value/@identifierTypeId=8261054f-be78-422d-bd51-4ed9f33c3422 (785633))
      

      returns one record

      {{baseUrl}}/inventory/instances?query=(identifiers= /@value/@identifierTypeId=8261054f-be78-422d-bd51-4ed9f33c3422 (888555))
      

      returns zero records

      Interested parties

      Chalmers, any library using identifiers as a match point in Data Import. Priority high.

      TestRail: Results

        Attachments

          1. identifier_mismatch_summary.xlsx
            12 kB
          2. image-2021-12-06-13-20-10-603.png
            image-2021-12-06-13-20-10-603.png
            37 kB
          3. image-2021-12-20-20-05-10-826.png
            image-2021-12-20-20-05-10-826.png
            17 kB
          4. image-2021-12-20-20-05-20-385.png
            image-2021-12-20-20-05-20-385.png
            17 kB
          5. Juniper ID Match Test File - Create.mrc
            4 kB
          6. Juniper ID Match Test File - Update1.mrc
            4 kB
          7. Juniper ID Match Test File - Update2.mrc
            4 kB
          8. Juniper ID Match Test File - Update3.mrc
            4 kB
          9. Juniper ID Match Test File - Update4.mrc
            4 kB
          10. screenshot-1.png
            screenshot-1.png
            11 kB
          11. screenshot-2.png
            screenshot-2.png
            13 kB
          12. screenshot-3.png
            screenshot-3.png
            139 kB
          13. screenshot-4.png
            screenshot-4.png
            20 kB
          14. screenshot-5.png
            screenshot-5.png
            21 kB
          15. screenshot-6.png
            screenshot-6.png
            35 kB
          16. screenshot-7.png
            screenshot-7.png
            78 kB
          17. screenshot-8.png
            screenshot-8.png
            155 kB
          18. screenshot-9.png
            screenshot-9.png
            160 kB

          Issue Links

            Activity

              People

                Miami20 Khamidulla Abdulkhakimov
                lisams Lisa Sjögren (EBSCO)
                Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases