Uploaded image for project: 'FOLIO'
  1. FOLIO
  2. FOLIO-704

Investigate a central compound object index



    • CP: Roadmap backlog
    • Core: Platform


      As the front-end is currently desirous of a way to search for users based on related objects (e.g., get a list of all the users with permissions X, Y and Z), it makes sense to evaluate the benefits of maintaining an index of composite objects to facilitate this.

      For example, to execute the previously mentioned request, we have to query the permissions module first to get a list of all permission-association objects that contain permissions X, Y and Z. Then, from this list of objects, we have to compile a list of usernames that these objects possess. Then we have to query the user module to get a list of all user records that match our list of usernames. Then, depending on the return format requested by the client, we may have to join the user objects together with the permissions objects to return the compound result.

      The logic gets more complicated when we add in features like sorting the result set based on a field in a module outside of the users module, or doing pagination. We also run into the issue of degraded performance for the "manual searching" approach when we increase the number of records that need to be searched.

      A possible solution would be to use something that's happy indexing huge amounts of data (e.g. solr) and inserting composite records into this index.

      For the schema, we could adopt a dot notation to indicate the object and subfield. For example, to search by userid, you'd use field "user.id". For a permission name, something like permissionsUser.permission_name.

      The good news is that this makes record retrieval stupidly easy. It's just a straightforward query, and translating between CQL and Solr queries is not hard. We'd be free to introduce as much complex logic as we wanted, as well as choose fields to sort on and suchlike.

      The downside, of course, is building and maintaining the index. How can this be done in a reliable fashion with the least amount of burden on the maintainers of the individual modules that contribute to the composite record?

      For example, we could maintain a message queue and allow modules to push messages to it whenever a record is created, updated or deleted. These messages would then be consumed in order and used to update the composite record index.

      It is worth noting that Solr supports partial document updating...which seems particular relevant here, since no one module would be sending enough information to update an entire document. https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

      Another option would be to try to implement this with no changes to the storage modules, and have a way to externally monitor for changes and then write them back to the indexes. One way to do this might be some kind of low level database trigger, though I worry that this might really violate the KISS principle. Another possibility could be a filter-level Okapi module that would listen for request types to various modules (e.g. PUT, POST, DELETE) and then create some kind of message into a queue for some process to query the module for changes and write these back to the index.

      Since this seems to potentially overlap several different issues, I'd like to determine fairly quickly whether or not this is a road worth going down, or if we want to try to implement the "manual searching" solution for the short term, at least.

      TestRail: Results


          1. screenshot-1.png
            39 kB
          2. screenshot-2.png
            27 kB
          3. screenshot-3.png
            29 kB

          Issue Links



                shale99 shale99
                kurt Kurt Nordstrom
                0 Vote for this issue
                5 Start watching this issue



                  Time Tracking

                    Original Estimate - Not Specified
                    Not Specified
                    Remaining Estimate - 0 minutes
                    Time Spent - 1 hour

                    TestRail: Runs

                      TestRail: Cases