Uploaded image for project: 'mod-source-record-manager'
  1. mod-source-record-manager
  2. MODSOURMAN-513

(Juniper) Data import stopped process before finishing: deadlock for "job_monitoring"

    XMLWordPrintable

Details

    • eHoldings Sprint 118, eHoldings Sprint 119
    • 1
    • Spitfire
    • R2 2021 Bugfix

    Description

      From MODDATATIMP-475: 

       

      Review logs for Iris Bugfest job that finished prematurely and did not create all the expected records

      • https://bugfest-iris.folio.ebsco.com
      • Job 6436 run by abreaux
      • 5,000 record file, TAMU_sample_bibs_5k_2.mrc, attached
      • Using job profile: PTF Create Instance Holdings Item
      • Job started at 8:47 am Iris Bugfest time, and it finished at 8:50 am Bugfest, so only 3 minutes!
      • UI log summary shows Completed with errors
      • UI log detail shows 1,950 SRS MARC, Instances, Holdings, and Items created
      • No indication of why it stopped after processing 1,950 incoming records, instead of all 5,000

       

       

      After investigation in the scope of MODDATAIMP-475, it seems like the root cause of this issue is that there is deadlock for the "job_monitoring"-table.

      There are some logs from srm-module from Iris-Bugfest(See full log as attached-file):

      2021-06-24T06:48:59.474Z io.vertx.pgclient.PgException: { "message": "deadlock detected", "severity": "ERROR", "code": "40P01", "detail": "Process 12999 waits for ShareLock on transaction 88425453; blocked by process 13000.\nProcess 13000 waits for ShareLock on transaction 88425452; blocked by process 12999.", "hint": "See server log for query details.", "where": "while rechecking updated tuple (2,38) in relation \"job_monitoring\"", "file": "deadlock.c", "line": "1146", "routine": "DeadLockReport" }

       

      It seems like fix for this issue should fix this unexpected behaviour.

      Note: there are a lot of errors "Couldn't update JobExecution status, JobExecution already marked as ERROR". But it seems like they were caused by deadlock (error handling mechanism worked in this way, so ignore them).

      TestRail: Results

        Attachments

          1. image-2021-07-07-13-17-04-577.png
            image-2021-07-07-13-17-04-577.png
            44 kB
          2. image-2021-07-07-13-18-46-542.png
            image-2021-07-07-13-18-46-542.png
            10 kB
          3. logs-2021-07-07-100726.tar.gz
            1.55 MB
          4. srs_07_30
            19.89 MB
          5. TAMU_sample_bibs_5k_2.mrc
            7.76 MB
          6. testing_bugfest-juniper_1.PNG
            testing_bugfest-juniper_1.PNG
            237 kB
          7. testing_bugfest-juniper_2.PNG
            testing_bugfest-juniper_2.PNG
            208 kB

          Issue Links

            Activity

              People

                Igor_Gorchakov Igor Gorchakov
                VRohach Volodymyr Rohach
                Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases