There are five use cases identified for MARC source records retrieval.
|1. Retrieve single records based on their Instance ID in order to feed the View Source in the UI|
|2. Retrieve all records||Retrieve paged responses as quickly as possible to the OAI-PMH modules.|
|3. Retrieve records for a period based on createdDate and updatedDate||Retrieve records filtered by their createdDate and updatedDates for certain date- and timespans for OAI-PMH|
|4. Retrieve MARC records for data export||Identifying records for the export will be Inventory driven so we will be getting underlying MARC records from SRS based on the Instance Id (and later by Holdings Id as well)|
|5. Retrieve MARC records based on custom criteria|
High-level strategy to improve SRS performance.
- We have to get rid of RMB PostgreSQL Client and CQL because it does not provide us with fine-grained control over how SQL statements and especially Where clauses are generated.
- Fields in JSON documents that are used in the search conditions intensively must be taken out as separate columns of the table. Based on the use cases analysis DB indexes must be created for those columns. The consequence is that DB tables must be managed by Liquibase instead of RMB and schema.json
- We must define several strategies to retrieve data and each strategy must be used for a particular use case(s).
- Retrieve all MARC source records in chunks (it means the row set must be ordered, the most efficient way here to do this is ordering by Id) (Covers use case 2)
- Retrieve MARC source records for a period in chunks (it means the row set must be ordered, the most efficient way here to do this is ordering by updatedDate. To make criteria simpler only updatedDate must be considered. CreatedDate should not be used) (Covers use case 3)
- Retrieve MARC source records by Instance Ids. InstanceId field must be indexed (Covers use cases 1 and 4)
- Retrieve MARC records by custom criteria. Used by use case 5.
- Data structures returned by SQL queries as well as returned by HTTP end-points must be narrowed and contain only data valuable for a consumer. (see sql functions in the script)
- These changes must be implemented ASAP because of the breaking nature of them. Also migration scripts must be created. (see script for a baseline)