Uploaded image for project: 'RAML Module Builder'
  1. RAML Module Builder
  2. RMB-499

Add "normalizeDigits" function

    XMLWordPrintable

Details

    • CP: sprint 77, CP: sprint 78, CP: sprint 79, CP: sprint 82, CP: sprint 83
    • 8
    • Core: Platform

    Description

      Description

      The idea here is to provide a new option in schema.json (e.g "normalizeDigits": "true") to allow normalising fields like ISBN or ISSN at index (to_tsvector) and query (to_tsquery) time in SQL when using the fulltext index. The normalisation should allow to search for a version of the term with the hyphens omitted. E.g given 11-222-333 the user should be able to to also search for 1122333 in addition to 11-222-333 and 11 222 333.

      Right now we use a default parsing in to_tsverctor/tsquery where a term like 11-222-333 is split into tokens: 11, -222, -333. Note: this behavior is only present when the components of a hyphenated term start with a digit, when they start with a letter, e.g {a11-a222-a333 to_tsvector would generate the following tokens instead: a11, a222, a333, a11-a222-a333.

      What we would really like to do here is to have the following tokens generated and indexed given 11-22-333: 11, 22, 333, 11-22-333, 11223333.

      This behavior could be useful for ISBN and ISSS but also e.g for UUID hence it is potentially a more general normalization feature.

      Implementation idea

      If the schema.json specified normalizeDigits=true, the input string (both in query passed to ts_query and in JSON field value passed to to_tsvector) has all occurrences

      (([0-9])+(-|\w)([0-9])+)+

      replaced with a string that consists of all the matched groups concatenated followed by a whitespace.

      Examples:

      "11-222-333" -> "11222333 "
      "11 222 333" -> "11222333 "
      "11-222 333" -> "11222333 "
      "11-222-333 (pbk)" -> "11222333 (pbk)"
      "11-222-333(pbk)" -> "11222333 (pbk)"
      "11-222-333pbk" -> "11222333 pbk"

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                julianladisch Julian Ladisch
                jakub Jakub Skoczen
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases