Details
-
Story
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Done
-
None
-
-
Folijet Sprint 106
-
5
-
Folijet
-
R1 2021
-
TAMU
Description
Refactor batch post source records to process records and perform batch database inserts into relevant tables
NOTE: This is based on William Welling's analysis of SRS changes that would be helpful for initial data migration. Several other issues linked to the same feature as this one
When saving a RecordCollection the current implementation iterates over the records and saving them individually.
The individual save of a record then looks up a snapshot, calculates generation, save raw record, update or save parsed record, save error record if any, looks up existing record by matchId, and finally if new saves the record else saves new and updates old record with status old.
https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/services/RecordServiceImpl.java#L83
https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/services/RecordServiceImpl.java#L96
https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/dao/RecordDaoImpl.java#L368
This is not a scalable approach for batch save and requires redesign. I propose a preprocessor that collects all rows for each table insert and concurrently performs batch inserts into appropriate tables.
- stream records from record collection
- collect list of raw records
- collect list of parsed records
- collect list of error records
- collect list of records
- concurrently batch insert each into its table within a transaction
Will likely have to have a database constraint handle the snapshot existence and in correct state. Additionally, calculate generation may have to become a database trigger or some other solution.
The batch update parse records will also require refactoring for performance.
- stream records from record collection
- collect list of records to update external ids
- collect list of parsed records to update
- concurrently batch update records and parsed records
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-2790 NFR: Make some revisions to Source Record Storage to improve performance for data migration
-
- Closed
-