Uploaded image for project: 'RAML Module Builder'
  1. RAML Module Builder
  2. RMB-432

fulltext: word splitting and punctuation removal

    XMLWordPrintable

Details

    • CP: sprint 69
    • 3
    • Core: Platform

    Description

      This CQL query

      title=Dell'Emulazione e dell'Influenza
      

      produces this SQL query:

      WHERE to_tsvector('simple', f_unaccent(instance.jsonb->>'title')) @@ to_tsquery('simple', f_unaccent('Dell'Emulazione<->e<->dell''Influenza'))
      

      Note that punctuation like Apostrophe and Right Single Quotation Mark (U+2019) are not removed. This cannot match because Postgres' to_tsvector removes them:

      select s, to_tsvector('simple', s) from (values('Dell'Emulazione e dell''Influenza')) as t(s);
                      s                 |                  to_tsvector
      ----------------------------------+-----------------------------------------------
       Dell'Emulazione e dell'Influenza | 'dell':1,4 'e':3 'emulazione':2 'influenza':5
      

      How to fix:
      Change the word splitting and ftTerm to work in a similar way as to_tsvector:
      https://github.com/folio-org/raml-module-builder/blob/v26.2.0/cql2pgjson/src/main/java/org/folio/cql2pgjson/CQL2PgJSON.java#L671

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                julianladisch Julian Ladisch
                julianladisch Julian Ladisch
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases