Uploaded image for project: 'RAML Module Builder'
  1. RAML Module Builder
  2. RMB-385

add 'queryIndexName' to schema.json and allow compound indexes

    XMLWordPrintable

Details

    • CP: ready for planning, CP: sprint 70, CP: sprint 71
    • 3
    • Core: Platform

    Description

      Problem statement

      RMB cannot create a single index that spans multiple JSON fields

      Solution

      a new property on the index section called multiFieldNames, when specified it will override fieldName for the purpose of matching index configuration with the query.

      E.g.

      ...
      "multiFieldNames": "firstName,lastName"
      "fieldName": "fullName"
      

      and the query fullName = "John Doe" should be able to utilize the above index. It uses spaces to concatenate the values found in the string together into one index

      A second option is also implemented now called sqlExpression

      ...
      "sqlExpression": "lower(concat_space_sql(jsonb->>'city', jsonb->>'state'))",
      "fieldName": "address"
      

      When defined this property completely replaces the index conversion with the user defined expression

      All index types are allowed

      Acceptance criteria

      • implementation and unit tests covering the above
      • update missing documentation about syntax for multiFieldNames and sqlExpression(support for commas and dots)
      • document multiFieldNames in RMB readme

      Implementation notes

      use cases first and last name, city and state

      The compound index should be created at "indexing" time by concatenating the value of the JSON fields. E.g:

      concat_ws(' ', jsonb->>'x', jsonb->>'y', jsonb->>'z')

      CQL-to-SQL query converter should generate the EXACT same expression to make sure the index is in use.

      Note on the UI workarounds for missing compound indexing

      A UI-only workaround will be implemented for Q2, for details see UIU-1068. The workaround expands the user input (list of words) into boolean OR/AND query across a list of indices.

      There is a couple of issues with this workaround, however, the main problems are:

      1. the query that gets generated by the UI gets long very quickly. It explodes in size both due to the number of terms and the number of indices searched. This will be a problem for handling things like an "all' search in the inventory which includes a list of 5 indices (that number is likely to grow) and may include long queries for things like titles.

      2. the UI needs to "tokenize" the user input and generate the final CQL query. This means the UI needs to recreate tokenization rules implemented in the backend. This is fragile and likely to get out of sync. This also means the query handling code in the UI will need to be replaced when more sophisticated search input should be supported – e.g to include support for boolean operators, index scoping, etc.

      3. the generated query may not be semantically equivalent to the original query. E.g search on title for title="the lord of the rings" results in and adjacency search where order of words matters. It is not equivalent to the expression title=the AND title=lord AND title=of AND title=the AND title=rings that would be generated. Order of words or their position in in document is also relevant for relevancy ranking (which FOLIO does not support at the moment).

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                evaluk Eric Valuk
                jakub Jakub Skoczen
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases