Uploaded image for project: 'mod-inventory'
  1. mod-inventory
  2. MODINV-392

Wait for POC of Elastic Search. Inventory Search does not work as expected for diacritics and vernacular titles (aka alternate graphical representations)

    XMLWordPrintable

Details

    • Prokopovych
    • Cornell, Duke

    Description

      Overview: Searching for title in keyword does not work when vernacular title appears in the paired 880 (245) field.

      Steps to Reproduce:

      1. Log into some FOLIO Snapshot as admin
      2. Ensure that Settings/Inventory/Z39.50 Target Profiles – OCLC WorldCat – Authentication is set to 100473910/PAOLF
      3. Open Inventory
      4. Use "Actions/Import" five times using these OCLC numbers: 85202630, 43564886, 1091030365 1153193551, 1089848331
      5. Search for a title in Russian, Chinese, Japanese, or Hebrew using the actual title in it's script (see example record's 880 $6 245 in Title (all).  Record is displayed in results list
      6. Searching the transliterated title (in 245 $a) without ligature and other diacritics brings back no results in some cases.  When I let the system replace ligature with none, by backspacing between the characters, results were as expected. However, when I removed other diacritics by hand, because the system has no replacement character, no results displayed
        1. These data are not normalized for indexing, which is an industry standard.  There are results in neither Keyword nor Titles (all)
        2. Staff will not usually include diacritics when searching due to keyboard limitations.

      Expected Results: Searching by title without including diacritics should result in finding the record. 
      Actual Results: Searching a Latin-based alphabet where a title includes diacritics, such as Portuguese, omitting said diacritics showed that these are normalized in the index (as expected by industry standard)
      Additional Information:
      These searches seem to work as expected when testing using the Inventory ES app. 

      As of 3/25/2021, these examples were in Snapshot Inventory: HRID in00000000082, HRID in00000000083, HRID in00000000085, HRID in00000000088, HRID in00000000090 because I successfully used the single record import function.  Here are are the associated OCLC system numbers used in these tests:

       

      Language Russian (no Cyrrilic) https://www.worldcat.org/oclc/85202630 Leti︠a︡shchai︠a︡ tufelʹka, ili, golyĭ nasmeshnik : sibirskie skazki i misticheskie bylichki
      Language Russian (includes Cyrillic) https://www.worldcat.org/oclc/43564886 Народные русские сказки А.Н. Афанасьева : в пяти томах.
      Language Hebrew (includes Hebrew alphabet) https://www.worldcat.org/oclc/1091030365 כך להישאר לעולם = Forever this way
      Language: Japanese translation of Chinese test (includes kanji and hanzi) https://www.worldcat.org/oclc/1153193551 唐人如何吟诗 : 带你走进汉语音韵学
      Language: Portuguese (Latin-based alphabet with diacritics) https://www.worldcat.org/oclc/1089848331 Fernão de Magalhães : um agente secreto ao serviço do rei D. Manuel I de Portugal?

      Interested parties: Everyone

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                charlotte Charlotte Whitt
                jacquie.samples@duke.edu Jacquie Samples
                Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases