Uploaded image for project: 'UX Product'
  1. UX Product
  2. UXPROD-2659

NFR: Refactor data-import flow to increase reliability

    XMLWordPrintable

Details

    • Q3 2020
    • Very Small (VS) < 1day
    • Medium
    • XXXL: 30-45 days
    • Folijet
    • 97
    • R2
    • R2
    • R2
    • R1
    • R2
    • R1
    • R2
    • R1

    Description

      Steps

      • Inventory
        • Increment Vert.x version to 3.8.4+ in mod-inventory to support vertx-kafka-client
        • Check and fix marshaling \ unmarshaling for JSON Marc
        • Create Consumers for each evenType and subscribe di-processing-core
        • Add support for exactly one delivery for each Consumer
      • PubSub
      • Data-Import
        • Change mod-data-import file processing to Kafka approach(can be moved from PoC) https://github.com/folio-org/mod-data-import/pull/130
        • Create ProducerManager
        • Add support for exactly one delivery for each chunk(can be added a unique UUID or hash for each chunk). Add common schema with eventId and extend all kafka created enties with this id. For first time add a stub interface method isProcessed
      • Source-Manager
        • Change chunk processing to Kafka approach(can be moved from PoC) https://github.com/folio-org/mod-source-record-manager/pull/315
        • Add support for exactly one delivery for each chunk. JobExecutionSourceChunk can be reused. UUID will be received from mod-data-import. On constraining violations - skip chunk processing and add logs.
        • Recieve answers from SRS and start processing in StoredMarcChunkConsumersVerticle and add exactly one delivery for each chunk.
        • Create consumers for DI_COMPLETED DI_ERROR and finish data-import (can be moved from PoC)
        • Move "secret button" functionality on Kafka approach (interactions between SRM-SRS)
      • Source-Storage
        • Add consumers for initial records load(before processing) and save chunks in batch. (can be moved from PoC) https://github.com/folio-org/mod-source-record-storage/pull/214
        • Add support for exactly one delivery for each chunk with records. Add the new entity to track chunk duplications. On constraining violations - skip chunk processing and add logs.
        • Add consumers to process created\updated entities and fill 999 and 001 fields(can be moved from PoC)
        • Add support for exactly one delivery for each Consumer
      • Processing-Core
        • Change transport implementation to direct Kafka approach and reuse new sub-module lib from mod-pubsub.
        • Check vert.x version and update if needed.
        • Request producer from pub-sub-utils?
      • Error handling for consumers
        Taras_Spashchenko will create pubsub stuff and error handling

      Change SRM DB approach. For now, it is a bottleneck for performance - move to R2 and create another one feature (should be smaller than this feature; same as SRS, plus will need migration scripts.

      Yellow = partly done
      Green = done

      Notes on maximum file size from Data Import Subgroup Sept 2020

      • For the PubSub/Kafka reconfig, max file size should be 500K records
      • But if we need interim, OK to use 100K, so long as there’s a clear understanding of when we’ll be able to increase to 500K.
      • Librarians are sending A-M a couple of the large files – 300K records for a large eBook collection, 1.4M records that all had to be updated with URL notes when the library closed for COVID
        abreaux can you provide example files with 300-500k records, put on Google drive and add links in description?

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                abreaux Ann-Marie Breaux
                Kateryna Senchenko Kateryna Senchenko
                Oleksii Kuzminov Oleksii Kuzminov
                Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases