Uploaded image for project: 'FOLIO'
  1. FOLIO
  2. FOLIO-1246

Implement Postgres Full Text Search functionality

    XMLWordPrintable

Details

    • Core: Platform

    Description

      In Durham we have discussed that the current approach to searching (which, essentially, is based on translating CQL queries to complex PSQL queries with LIKE/ILIKE/regex and creating BTREE and GIN indices on particular down-cased/unaccented columns, with support from RMB and CQL2PG) may not be flexible enough to provide the level of search functionality and performance expected. (Note: we do not have concrete requirements here, we are assuming that what is expected matches functionality in modern OPAC systems).

      One of the approaches discussed was to use the Postgres built-in "Full Text Search" capability, documented here:

      https://www.postgresql.org/docs/current/static/textsearch-intro.html

      In short, the essential pieces of full text search in PG are two functions/data structures:

      • tsvector
      • tsquery

      tsvector takes a “document” which can be a single column, multiple columns from a single table or multiple tables, a view, or the entire JSONB document and returns a list (vector) of lexemes, which are normalised tokens (case/accent normalised, with stopwords filtered and roots extrated).

      tsquery does the same but with the user input, e.g a list of words to match. tsquery supports the same booleans as CQL so the translation is rather trivial — & for AND, || for OR, ! for NOT, <-> for PROX

      One can run both tsvector and tsquery at the query time, but this obviously does not scale so normally tsvector is used at the indexing time to build a GIN index on the selected data.
      tsvector/tsquery are the most fundamental data structures, PG includes more advanced functionalities expected in a search engine: ranking, highlighting/snippets, etc. It does not include specialized facet data structure – this needs to be implemented through other methods.

      Since PG 10, tsvector can take the whole JSONB contents in one go:
      https://wiki.postgresql.org/wiki/New_in_postgres_10#Full_Text_Search_support_for_JSON_and_JSONB

      This, however, drops the keys and only indexes the “contents” of the JSONB, so we would not be able to “scope” the searches to a given field. One proposed approach here is to use triggers to do advanced indexing:
      https://stackoverflow.com/questions/45680936/how-to-implement-full-text-search-on-complex-nested-jsonb-in-postgresql

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                jakub Jakub Skoczen
                jakub Jakub Skoczen
                Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases