Uploaded image for project: 'RAML Module Builder'
  1. RAML Module Builder
  2. RMB-380

implement new approach for searching array-of-strings and array-of-objects fields

    XMLWordPrintable

Details

    • Core: Platform - Sprint 61, CP: sprint 64, CP: sprint 65, CP: sprint 66
    • 5
    • Core: Platform

    Description

      At the moment, array field searching (either array of values or arrays of objects) happens by directly matching on a serialised JSON text as explained here:

      https://github.com/folio-org/cql2pgjson-java#matching-array-elements

      This syntax is cumbersome and susceptible to breakage when underlying JSON structure changes.

      The proposed change is to simplify the search syntax and isolate the client from underlying JSON structure:

      For a field which is an array of strings, a clause:

      index = value

      will match if any of the array values matches the value

      For a field which is an array of objects, a clause:

      index = value

      will match if any of the objects in the array contains a field whose value matches the term (JFS: adam: we will likely need to index field names with some discriminator along with the content]

      while:

      index =/@attr value

      will match if any of the objects in the array contains an attribute attr with a value value.

      in case there is multiple value-less attribute modifiers:

      index =/@attr/@attr1 value

      will match if any of the objects in the array contains an both attr and attr1 with a value value.

      For a field which is an array of objects, a clause:

      index =/@attr/@attr1=value1/@attr2=value2 value

      will match if there exists an object in the array that

      1. contains a field called attr and the field's value matches the value AND
      2. contains a field called attr{n} and its value is value{n} for each n.

      Implementation notes

      Note 1: We will use a FULLTEXT index for the array-of-strings field searching as this type of index provides out-of-the-box tokenization. The thing to consider/investigate here is the ability to disable stemming and stopwords.

      Note 2: We will also use fulltext index for array-of-objects fields. There is a problem with finding false positives when e.g searching the contributors object (e.g type=illustrator name=John Doe when there is also type=editor name=John Doe). We need to solve this either by using the positional information or maybe by using the weight marker but it won't be solved in the first phase of implementation.

      Note 3: searching array-of-objects will require a new syntax (see above) that is not backwards compatible with the UI. We can consider providing an automatic translation from the existing syntax (which uses the JSON structure directly) to the new syntax to ease the transition.

      Note 4: we discussed indexing the content along with keys so we can generate queries like " name <1> John James Doe & value <1> editor" This gives false hits if JDD is e.g illustrator and editor exists but it avoids matching across keys.

      Note 5: For array-of-objects searching we also discussed introducing (this may already exist, to be verified) a notion of a "virtual" search index that would be backed up by a index defined in schema.json. This index would be based on the subset of the JSON array-of-objects structure or a combination of fields. E.g for the "identifiers" structure we could introduce two "virtual" indexes: identifiers.isbn and identifiers.issn (issn) Each of the indexes would be created on the subset of the structure: identifiers.isbn -> identifiers[@type=123-456-789].

      Note 5a: (Copied from MODINVSTOR-182)
      One possibility how to extract the values:

      SELECT COALESCE(jsonb_agg(value), '[]')
            FROM jsonb_to_recordset(jsonb->'identifiers')
            AS x(key text, value text)
            WHERE value IS NOT NULL
      

      One possibility how to extract the ISBN values:

      SELECT COALESCE(jsonb_agg(value), '[]')
         FROM jsonb_to_recordset(jsonb->'identifiers')
         AS y(key text, value text)
         WHERE key='26978ebf-eceb-4537-b5a7-9482c2cce893'
      

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                adam Adam Dickmeiss
                jakub Jakub Skoczen
                Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases