Details
-
New Feature
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Done
-
None
-
None
-
Q3 2020
-
Very Small (VS) < 1day
-
Medium
-
XXXL: 30-45 days
-
Folijet
-
-
97
-
R2
-
R2
-
R2
-
R1
-
R2
-
R1
-
R2
-
R1
Description
Steps
- Inventory
- Increment Vert.x version to 3.8.4+ in mod-inventory to support vertx-kafka-client
- Check and fix marshaling \ unmarshaling for JSON Marc
- Create Consumers for each evenType and subscribe di-processing-core
- Add support for exactly one delivery for each Consumer
- PubSub
- Create new sub-project in the mod-pubsub and move all common transport layer's utility classes from PoC https://github.com/folio-org/mod-source-record-manager/pull/315
- Data-Import
- Change mod-data-import file processing to Kafka approach(can be moved from PoC) https://github.com/folio-org/mod-data-import/pull/130
- Create ProducerManager
- Add support for exactly one delivery for each chunk(can be added a unique UUID or hash for each chunk). Add common schema with eventId and extend all kafka created enties with this id. For first time add a stub interface method isProcessed
- Source-Manager
- Change chunk processing to Kafka approach(can be moved from PoC) https://github.com/folio-org/mod-source-record-manager/pull/315
- Add support for exactly one delivery for each chunk. JobExecutionSourceChunk can be reused. UUID will be received from mod-data-import. On constraining violations - skip chunk processing and add logs.
- Recieve answers from SRS and start processing in StoredMarcChunkConsumersVerticle and add exactly one delivery for each chunk.
- Create consumers for DI_COMPLETED DI_ERROR and finish data-import (can be moved from PoC)
- Move "secret button" functionality on Kafka approach (interactions between SRM-SRS)
- Source-Storage
- Add consumers for initial records load(before processing) and save chunks in batch. (can be moved from PoC) https://github.com/folio-org/mod-source-record-storage/pull/214
- Add support for exactly one delivery for each chunk with records. Add the new entity to track chunk duplications. On constraining violations - skip chunk processing and add logs.
- Add consumers to process created\updated entities and fill 999 and 001 fields(can be moved from PoC)
- Add support for exactly one delivery for each Consumer
- Processing-Core
- Change transport implementation to direct Kafka approach and reuse new sub-module lib from mod-pubsub.
- Check vert.x version and update if needed.
- Request producer from pub-sub-utils?
- Error handling for consumers
Taras_Spashchenko will create pubsub stuff and error handling
Change SRM DB approach. For now, it is a bottleneck for performance - move to R2 and create another one feature (should be smaller than this feature; same as SRS, plus will need migration scripts.
Yellow = partly done
Green = done
Notes on maximum file size from Data Import Subgroup Sept 2020
- For the PubSub/Kafka reconfig, max file size should be 500K records
- But if we need interim, OK to use 100K, so long as there’s a clear understanding of when we’ll be able to increase to 500K.
- Librarians are sending A-M a couple of the large files – 300K records for a large eBook collection, 1.4M records that all had to be updated with URL notes when the library closed for COVID
abreaux can you provide example files with 300-500k records, put on Google drive and add links in description?
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-47 Batch Importer (Bib/Acq)
-
- Analysis Complete
-
- has to be done after
-
MODPUBSUB-122 Create a PoC with direct Kafka integration
-
- Closed
-
- is defined by
-
MODDATAIMP-315 Use Kafka for data-import file processing
-
- Closed
-
-
MODDICORE-82 Change transport layer implementation to use Kafka
-
- Closed
-
-
MODINV-326 Refactor data-import handler to consume message from Kafka
-
- Closed
-
-
MODINV-331 Upgrade to Vertx v3.9.4 (CVE-2019-17640)
-
- Closed
-
-
MODINV-373 Ensure exactly once processing for interaction via Kafka
-
- Closed
-
-
MODPUBSUB-114 Data Import stops when trying to load a large file to folio-snapshot-load
-
- Closed
-
-
MODPUBSUB-118 Create sub-project in mod-pubsub for utility transport layer classes
-
- Closed
-
-
MODPUBSUB-120 SPIKE: Describe approach for data-import flow using Kafka
-
- Closed
-
-
MODPUBSUB-136 Memory Leaks: HttpClients
-
- Closed
-
-
MODSOURCE-173 Refactor inventory-instance handler to consume message from Kafka
-
- Closed
-
-
MODSOURCE-177 Change SRM-SRS interaction to use Kafka
-
- Closed
-
-
MODSOURCE-230 Deploy new Kafka approach to the Rancher env and test it
-
- Closed
-
-
MODSOURCE-235 Ensure exactly once processing for SRM-SRS interaction via Kafka
-
- Closed
-
-
MODSOURMAN-336 Refactor created-inventory-instance handler to consume message from Kafka
-
- Closed
-
-
MODSOURMAN-337 Refactor processing-result handler to consume message from Kafka
-
- Closed
-
-
MODSOURMAN-338 Change chunk processing to use Kafka
-
- Closed
-
-
MODSOURMAN-400 Ensure exactly once processing for data-import-SRM interaction via Kafka
-
- Closed
-