Details
-
Task
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Done
-
None
-
-
Core: Platform - Sprint 61, CP: sprint 64, CP: sprint 65, CP: sprint 66
-
5
-
Core: Platform
Description
At the moment, array field searching (either array of values or arrays of objects) happens by directly matching on a serialised JSON text as explained here:
https://github.com/folio-org/cql2pgjson-java#matching-array-elements
This syntax is cumbersome and susceptible to breakage when underlying JSON structure changes.
The proposed change is to simplify the search syntax and isolate the client from underlying JSON structure:
For a field which is an array of strings, a clause:
index = value
will match if any of the array values matches the value
For a field which is an array of objects, a clause:
index = value
will match if any of the objects in the array contains a field whose value matches the term (JFS: adam: we will likely need to index field names with some discriminator along with the content]
while:
index =/@attr value
will match if any of the objects in the array contains an attribute attr with a value value.
in case there is multiple value-less attribute modifiers:
index =/@attr/@attr1 value
will match if any of the objects in the array contains an both attr and attr1 with a value value.
For a field which is an array of objects, a clause:
index =/@attr/@attr1=value1/@attr2=value2 value
will match if there exists an object in the array that
1. contains a field called attr and the field's value matches the value AND
2. contains a field called attr{n} and its value is value{n} for each n.
Implementation notes
Note 1: We will use a FULLTEXT index for the array-of-strings field searching as this type of index provides out-of-the-box tokenization. The thing to consider/investigate here is the ability to disable stemming and stopwords.
Note 2: We will also use fulltext index for array-of-objects fields. There is a problem with finding false positives when e.g searching the contributors object (e.g type=illustrator name=John Doe when there is also type=editor name=John Doe). We need to solve this either by using the positional information or maybe by using the weight marker but it won't be solved in the first phase of implementation.
Note 3: searching array-of-objects will require a new syntax (see above) that is not backwards compatible with the UI. We can consider providing an automatic translation from the existing syntax (which uses the JSON structure directly) to the new syntax to ease the transition.
Note 4: we discussed indexing the content along with keys so we can generate queries like " name <1> John James Doe & value <1> editor" This gives false hits if JDD is e.g illustrator and editor exists but it avoids matching across keys.
Note 5: For array-of-objects searching we also discussed introducing (this may already exist, to be verified) a notion of a "virtual" search index that would be backed up by a index defined in schema.json. This index would be based on the subset of the JSON array-of-objects structure or a combination of fields. E.g for the "identifiers" structure we could introduce two "virtual" indexes: identifiers.isbn and identifiers.issn (issn) Each of the indexes would be created on the subset of the structure: identifiers.isbn -> identifiers[@type=123-456-789].
Note 5a: (Copied from MODINVSTOR-182)
One possibility how to extract the values:
SELECT COALESCE(jsonb_agg(value), '[]') FROM jsonb_to_recordset(jsonb->'identifiers') AS x(key text, value text) WHERE value IS NOT NULL
One possibility how to extract the ISBN values:
SELECT COALESCE(jsonb_agg(value), '[]') FROM jsonb_to_recordset(jsonb->'identifiers') AS y(key text, value text) WHERE key='26978ebf-eceb-4537-b5a7-9482c2cce893'
TestRail: Results
Attachments
Issue Links
- blocks
-
CQLPG-96 remove regex-based query time tokenization
-
- Closed
-
-
FOLIO-1934 RMB 25 release (performance, refactoring)
-
- Closed
-
-
RMB-418 Rename fields for array relation modifiers in schema.json
-
- Closed
-
-
UIIN-435 Basic Search: Search by Contributor on partial name doesn't return results
-
- Closed
-
-
UIIN-564 Combined search: Search by contributor(s) and resource title
-
- Closed
-
-
UIIN-621 Basic Search: Search by Identifier doesn't support right truncation
-
- Closed
-
-
UXPROD-1818 RMB 26 release features and core module rollout (performance, array search and cross-table search)
-
- Closed
-
- is duplicated by
-
CQLPG-41 Implement foreign key searches
-
- Closed
-
-
FOLIO-922 Searching in repeated fields (arrays) in PostgreSQL JSON records.
-
- Closed
-
-
MODINVSTOR-183 Analyze contributor/subject indexes
-
- Closed
-
- relates to
-
CQLPG-24 CQL translator: support Array=String
-
- Closed
-
-
CQLPG-42 Array searches in the FT mode
-
- Closed
-
-
CQLPG-90 Remove legacy query translation based on individual json schema files
-
- Closed
-
-
RMB-382 Sort by array-of-strings and array-of-objects fields
-
- Open
-
-
RMB-385 add 'queryIndexName' to schema.json and allow compound indexes
-
- Closed
-
-
RMB-410 move CQL2PG documentation to RMB
-
- Closed
-
-
RMB-413 SPIKE 2: design new approach for searching array-of-strings and array-of-objects fields
-
- Open
-
-
CQLPG-81 Redesign CQL to SQL generation
-
- Closed
-
-
CQLPG-95 SPIKE: design foreign key search support
-
- Closed
-
-
MODINVSTOR-182 analyze and optimize identifiers array indexes (ISBN, ISSN, LCCN, OCLC, …)
-
- Closed
-
-
RMB-407 Move ftTerm to Cql2SqlUtil and unit test it
-
- Closed
-
-
RMB-416 full text match for array relation modifiers
-
- Closed
-
-
RMB-417 optimize search for array relation modifiers
-
- Closed
-
-
RMB-422 Invalid SQL for array modifiers
-
- Closed
-
-
STCOM-492 Change filter search cql syntax to use == not =
-
- Closed
-
-
UIU-974 SPIKE: Solution for UIU-939 (Can't Find User by First and Last Name)
-
- Closed
-