Uploaded image for project: 'UX Product'
  1. UX Product
  2. UXPROD-2659

NFR: Refactor data-import flow to increase reliability

    XMLWordPrintable

    Details

    • Template:
      UXPROD features
    • Release:
      Q3 2020
    • Front End Estimate:
      Very Small (VS) < 1day
    • Back End Estimate:
      XXXL: 30-45 days
    • Confidence factor:
      Medium
    • Development Team:
      Folijet
    • Calculated Total Rank:
      55
    • PO Rank:
      97
    • Rank: Chicago (MVP Sum 2020):
      R2
    • Rank: Cornell (Full Sum 2021):
      R2
    • Rank: 5Colleges (Full Jul 2021):
      R2
    • Rank: FLO (MVP Sum 2020):
      R1
    • Rank: GBV (MVP Sum 2020):
      R2
    • Rank: MO State (MVP June 2020):
      R1
    • Rank: TAMU (MVP Jan 2021):
      R2
    • Rank: U of AL (MVP Oct 2020):
      R1

      Description

      Steps

      • Inventory
        • Increment Vert.x version to 3.8.4+ in mod-inventory to support vertx-kafka-client
        • Check and fix marshaling \ unmarshaling for JSON Marc
        • Create Consumers for each evenType and subscribe di-processing-core
        • Add support for exactly one delivery for each Consumer
      • PubSub
      • Data-Import
        • Change mod-data-import file processing to Kafka approach(can be moved from PoC) https://github.com/folio-org/mod-data-import/pull/130
        • Create ProducerManager
        • Add support for exactly one delivery for each chunk(can be added a unique UUID or hash for each chunk). Add common schema with eventId and extend all kafka created enties with this id. For first time add a stub interface method isProcessed
      • Source-Manager
        • Change chunk processing to Kafka approach(can be moved from PoC) https://github.com/folio-org/mod-source-record-manager/pull/315
        • Add support for exactly one delivery for each chunk. JobExecutionSourceChunk can be reused. UUID will be received from mod-data-import. On constraining violations - skip chunk processing and add logs.
        • Recieve answers from SRS and start processing in StoredMarcChunkConsumersVerticle and add exactly one delivery for each chunk.
        • Create consumers for DI_COMPLETED DI_ERROR and finish data-import (can be moved from PoC)
        • Move "secret button" functionality on Kafka approach (interactions between SRM-SRS)
      • Source-Storage
        • Add consumers for initial records load(before processing) and save chunks in batch. (can be moved from PoC) https://github.com/folio-org/mod-source-record-storage/pull/214
        • Add support for exactly one delivery for each chunk with records. Add the new entity to track chunk duplications. On constraining violations - skip chunk processing and add logs.
        • Add consumers to process created\updated entities and fill 999 and 001 fields(can be moved from PoC)
        • Add support for exactly one delivery for each Consumer
      • Processing-Core
        • Change transport implementation to direct Kafka approach and reuse new sub-module lib from mod-pubsub.
        • Check vert.x version and update if needed.
        • Request producer from pub-sub-utils?
      • Error handling for consumers
        Taras Spashchenko will create pubsub stuff and error handling

      Change SRM DB approach. For now, it is a bottleneck for performance - move to R2 and create another one feature (should be smaller than this feature; same as SRS, plus will need migration scripts.

      Yellow = partly done
      Green = done

      Notes on maximum file size from Data Import Subgroup Sept 2020

      • For the PubSub/Kafka reconfig, max file size should be 500K records
      • But if we need interim, OK to use 100K, so long as there’s a clear understanding of when we’ll be able to increase to 500K.
      • Librarians are sending A-M a couple of the large files – 300K records for a large eBook collection, 1.4M records that all had to be updated with URL notes when the library closed for COVID
        Ann-Marie Breaux can you provide example files with 300-500k records, put on Google drive and add links in description?

        TestRail: Results

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                abreaux Ann-Marie Breaux
                Reporter:
                Kateryna Senchenko Kateryna Senchenko
                Back End Estimator:
                Oleksii Kuzminov Oleksii Kuzminov
                Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    TestRail: Runs

                      TestRail: Cases