Uploaded image for project: 'mod-source-record-storage'
  1. mod-source-record-storage
  2. MODSOURCE-205

SRS revision: Refactor batch API implementation

    XMLWordPrintable

    Details

    • Template:
    • Sprint:
      Folijet Sprint 106
    • Story Points:
      5
    • Development Team:
      Folijet
    • Release:
      R1 2021
    • Affected Institution:
      TAMU

      Description

      Refactor batch post source records to process records and perform batch database inserts into relevant tables

      NOTE: This is based on William Welling's analysis of SRS changes that would be helpful for initial data migration. Several other issues linked to the same feature as this one

      When saving a RecordCollection the current implementation iterates over the records and saving them individually.

      https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/services/RecordServiceImpl.java#L114

      The individual save of a record then looks up a snapshot, calculates generation, save raw record, update or save parsed record, save error record if any, looks up existing record by matchId, and finally if new saves the record else saves new and updates old record with status old.

      https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/services/RecordServiceImpl.java#L83
      https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/services/RecordServiceImpl.java#L96
      https://github.com/folio-org/mod-source-record-storage/blob/master/mod-source-record-storage-server/src/main/java/org/folio/dao/RecordDaoImpl.java#L368

      This is not a scalable approach for batch save and requires redesign. I propose a preprocessor that collects all rows for each table insert and concurrently performs batch inserts into appropriate tables.

      • stream records from record collection
      • collect list of raw records
      • collect list of parsed records
      • collect list of error records
      • collect list of records
      • concurrently batch insert each into its table within a transaction

      Will likely have to have a database constraint handle the snapshot existence and in correct state. Additionally, calculate generation may have to become a database trigger or some other solution.

      The batch update parse records will also require refactoring for performance.

      • stream records from record collection
      • collect list of records to update external ids
      • collect list of parsed records to update
      • concurrently batch update records and parsed records

        TestRail: Results

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                wwelling William Welling
                Reporter:
                abreaux Ann-Marie Breaux
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    TestRail: Runs

                      TestRail: Cases