Details
-
Bug
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Done
-
2.0.0
-
None
-
Folijet Sprint 111
-
2
-
Folijet
-
R1 2021 Bug Fix
Description
Overview:
Iris Data Import seems to be creating duplicate records when multiple partitions of Data Imports modules' (mod-srs, mod-srm, mod-inventory, mod-data-import) topics are set up. When a 50K records import was performed, we'd expect to have 50K instances created in around 2 hours. However, it was observed that 97K+ instances were created and 51K+ holdings and items were created in nearly 4 hours of import. This was when the DI's topics had 5 partitions each.
Steps to Reproduce:
- Create DI topics with 5 partitions each (
DI_COMPLETED DI_INVENTORY_HOLDING_CREATED DI_INVENTORY_INSTANCE_CREATED DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING DI_SRS_MARC_BIB_RECORD_CREATED DI_PARSED_RECORDS_CHUNK_SAVED DI_RAW_MARC_BIB_RECORDS_CHUNK_READ DI_RAW_RECORDS_CHUNK_PARSED DI_SRS_MARC_BIB_INSTANCE_HRID_SET DI_SRS_MARC_BIB_RECORD_CREATED
-
- Perform a 50K records import using the Create Instances, Holdings, and Items profile that abreaux outlined in Data Import Perf Testing Script_1 here (https://drive.google.com/drive/folders/1NHijTqZFSk8AIObqrQk-L7oiSklH_Aj- )
- Observe that the record counts are much higher than expected after 3-4 hours.
(ATM we don't know the module that's causing this problem, so this issue is created in mod-data-import for now).
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-2614 NFR: Data Import (Batch Importer for Bib Acq) & PubSub R1 2021 Technical, NFR, & Misc bug work
-
- Closed
-
- relates to
-
MODDATAIMP-410 Determine if necessary: investigate and choose a partition strategy for Data-Import
-
- Open
-