Details
-
Umbrella
-
Status: Closed (View Workflow)
-
P3
-
Resolution: Done
-
None
-
-
Core: Platform
Description
In Durham we have discussed that the current approach to searching (which, essentially, is based on translating CQL queries to complex PSQL queries with LIKE/ILIKE/regex and creating BTREE and GIN indices on particular down-cased/unaccented columns, with support from RMB and CQL2PG) may not be flexible enough to provide the level of search functionality and performance expected. (Note: we do not have concrete requirements here, we are assuming that what is expected matches functionality in modern OPAC systems).
One of the approaches discussed was to use the Postgres built-in "Full Text Search" capability, documented here:
https://www.postgresql.org/docs/current/static/textsearch-intro.html
In short, the essential pieces of full text search in PG are two functions/data structures:
- tsvector
- tsquery
tsvector takes a “document” which can be a single column, multiple columns from a single table or multiple tables, a view, or the entire JSONB document and returns a list (vector) of lexemes, which are normalised tokens (case/accent normalised, with stopwords filtered and roots extrated).
tsquery does the same but with the user input, e.g a list of words to match. tsquery supports the same booleans as CQL so the translation is rather trivial — & for AND, || for OR, ! for NOT, <-> for PROX
One can run both tsvector and tsquery at the query time, but this obviously does not scale so normally tsvector is used at the indexing time to build a GIN index on the selected data.
tsvector/tsquery are the most fundamental data structures, PG includes more advanced functionalities expected in a search engine: ranking, highlighting/snippets, etc. It does not include specialized facet data structure – this needs to be implemented through other methods.
Since PG 10, tsvector can take the whole JSONB contents in one go:
https://wiki.postgresql.org/wiki/New_in_postgres_10#Full_Text_Search_support_for_JSON_and_JSONB
This, however, drops the keys and only indexes the “contents” of the JSONB, so we would not be able to “scope” the searches to a given field. One proposed approach here is to use triggers to do advanced indexing:
https://stackoverflow.com/questions/45680936/how-to-implement-full-text-search-on-complex-nested-jsonb-in-postgresql
TestRail: Results
Attachments
Issue Links
- blocks
-
UISE-68 Codex search treats Swedish diacritics as ascii equivalents
-
- Closed
-
-
UISE-69 Codex search results treats Swedish diacritics as ascii equivalents when sorting results
-
- Closed
-
-
UXPROD-745 Tenant Sort Order Setting
-
- Open
-
-
UXPROD-1045 Fulltext Search
-
- Closed
-
-
UXPROD-1048 Fulltext indexing for textual fields
-
- Closed
-
-
UXPROD-1049 Search by automatically deriving relationships and data types from table and entity schema
-
- Closed
-
- relates to
-
CQLPG-37 Schema driven CQL generation and PGFT support
-
- Closed
-
-
CQLPG-40 Find out what mod-inventory-storage needs to work with the fulltext stuff
-
- Closed
-
-
MODINVSTOR-163 queries involving contributors or subject do not use an index
-
- In progress
-
-
UISE-80 Search results are not sorted by title
-
- Closed
-