Iris Data Import seems to be creating duplicate records when multiple partitions of Data Imports modules' (mod-srs, mod-srm, mod-inventory, mod-data-import) topics are set up. When a 50K records import was performed, we'd expect to have 50K instances created in around 2 hours. However, it was observed that 97K+ instances were created and 51K+ holdings and items were created in nearly 4 hours of import. This was when the DI's topics had 5 partitions each.
Steps to Reproduce:
- Create DI topics with 5 partitions each (
DI_COMPLETED DI_INVENTORY_HOLDING_CREATED DI_INVENTORY_INSTANCE_CREATED DI_INVENTORY_INSTANCE_CREATED_READY_FOR_POST_PROCESSING DI_SRS_MARC_BIB_RECORD_CREATED DI_PARSED_RECORDS_CHUNK_SAVED DI_RAW_MARC_BIB_RECORDS_CHUNK_READ DI_RAW_RECORDS_CHUNK_PARSED DI_SRS_MARC_BIB_INSTANCE_HRID_SET DI_SRS_MARC_BIB_RECORD_CREATED
- Perform a 50K records import using the Create Instances, Holdings, and Items profile that abreaux outlined in Data Import Perf Testing Script_1 here (https://drive.google.com/drive/folders/1NHijTqZFSk8AIObqrQk-L7oiSklH_Aj- )
- Observe that the record counts are much higher than expected after 3-4 hours.
(ATM we don't know the module that's causing this problem, so this issue is created in mod-data-import for now).