Uploaded image for project: 'RAML Module Builder'
  1. RAML Module Builder
  2. RMB-432

fulltext: word splitting and punctuation removal

    XMLWordPrintable

    Details

    • Template:
    • Sprint:
      CP: sprint 69
    • Story Points:
      3
    • Development Team:
      Core: Platform

      Description

      This CQL query

      title=Dell'Emulazione e dell'Influenza
      

      produces this SQL query:

      WHERE to_tsvector('simple', f_unaccent(instance.jsonb->>'title')) @@ to_tsquery('simple', f_unaccent('Dell'Emulazione<->e<->dell''Influenza'))
      

      Note that punctuation like Apostrophe and Right Single Quotation Mark (U+2019) are not removed. This cannot match because Postgres' to_tsvector removes them:

      select s, to_tsvector('simple', s) from (values('Dell'Emulazione e dell''Influenza')) as t(s);
                      s                 |                  to_tsvector
      ----------------------------------+-----------------------------------------------
       Dell'Emulazione e dell'Influenza | 'dell':1,4 'e':3 'emulazione':2 'influenza':5
      

      How to fix:
      Change the word splitting and ftTerm to work in a similar way as to_tsvector:
      https://github.com/folio-org/raml-module-builder/blob/v26.2.0/cql2pgjson/src/main/java/org/folio/cql2pgjson/CQL2PgJSON.java#L671

        TestRail: Results

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                julianladisch Julian Ladisch
                Reporter:
                julianladisch Julian Ladisch
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    TestRail: Runs

                      TestRail: Cases