Details
-
Bug
-
Status: Closed (View Workflow)
-
P1
-
Resolution: Duplicate
-
None
-
None
-
Folijet Sprint 114
-
0.5
-
Folijet
-
Yes
Description
Overview:
When performing multiple updates (or one update after another) on the same set of records, the following behaviors were observed in the following sequence of jobs launched:
0. First update, job's status: Completed
1. Second update, job's status: Completed with Errors
2. Third update, job's status: Failed
3. New updates or create jobs could not even start.
4. Restarted mod-inventory, nothing seemed to happen.
- Note: When attempting a CREATE job after things got stuck in step #3, only the DI_RAW_MARC_BIB_RECORDS_CHUNK_READ and DI_RAW_RECORDS_CHUNK_PARSED topics had new messages in them. The other DI messages didn't have any new messages.
Between #1 and #2 we saw the following errors in mod-inventory:
11:10:16 [] [] [] [] ERROR dateItemEventHandler Error updating inventory Item: org.folio.processing.exceptions.MappingException: java.lang.NullPointerException
11:10:17 [] [] [] [] ERROR KafkaConsumerWrapper Error while processing a record - id: 20 subscriptionPattern: SubscriptionDefinition(eventType=DI_INVENTORY_ITEM_MATCHED, subscriptionPattern=cap2\.Default\.\w{4,}\.DI_INVENTORY_ITEM_MATCHED)
io.vertx.core.impl.NoStackTraceThrowable: Failed to process data import event payload
In mod-srs's log there is this error message:
org.jooq.exception.DataAccessException: SQL [null]; ERROR: insert or update on table "raw_records_lb" violates foreign key constraint "fk_raw_records_records"
After restarting mod-inventory, the following errors were logged in mod-inventory's log:
15:58:59 [] [] [] [] INFO SubscriptionState [Consumer clientId=kafka-cache-reader-events_cache, groupId=kafka-cache-e98dd93c4557] Resetting offset for partition events_cache-0 to offset 26661431.
Exception in thread "main" java.util.concurrent.TimeoutException
Sporadically, these errors were logged as well:
14:36:44 [] [] [] [] WARN ? Thread Thread[vert.x-worker-thread-14,5,main] has been blocked for 90230 ms, time limit is 60000 ms
14:40:30 [] [] [] [] INFO ConsumerCoordinator [Consumer clientId=consumer-DI_SRS_MARC_BIB_RECORD_MATCHED.mod-inventory-16.3.1-22, groupId=DI_SRS_MARC_BIB_RECORD_MATCHED.mod-inventory-16.3.1] Setting offset for partition cap2.Default.fs09000000.DI_SRS_MARC_BIB_RECORD_MATCHED-0 to the committed offset FetchPosition{offset=2000, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-1.temp-data-import-test.kh4zs4.c11.kafka.us-east-1.amazonaws.com:9092 (id: 1 rack: use1-az4)], epoch=0}}
Could the process that cleans the cache gets blocked or creates a blocker for downstream processing?
Steps to Reproduce:
abreaux's video in dataimport_folijet_ptf Slack channel has all the steps https://folio-project.slack.com/archives/G01PFEDAF6H/p1619499196163700
Expected Results:
Multiple updates (via the data import mechanism) on the same dataset would work all the time without causing errors or getting DI to get bogged down and unusable for everyone.
Actual Results:
See above description.
Interested parties:
abreaux OleksiiKuzminov
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-3023 NFR: R2 2021 Juniper Data Import Stabilization and Reliability work
-
- Closed
-
- relates to
-
MODDICORE-137 Item update failed
-
- Closed
-
-
MODSOURCE-278 Record from chunk is not saved occasionally
-
- Closed
-