Uploaded image for project: 'mod-data-import'
  1. mod-data-import
  2. MODDATAIMP-524

SPIKE: First import of the day, and/or after environment refresh, is slow or completes with errors

    XMLWordPrintable

Details

    • 0
    • Folijet
    • Orchid (R1 2023)
    • Chalmers, Cornell, Middle Tennessee State University, MO State, TAMU, Washington College, Washington-Jefferson
    • Not a bug

    Description

      See additional description and comments on MODDATAIMP-383, which was closed as a duplicate

      Reported by 2 libraries:

      Also may be that an import sometimes does not finish on try 1, but (same file, same job profile) finishes on job 2.

      See if if is better when the library is on Juniper before moving this from draft to open, plus put into standard bug format and add description and repro steps.

      Can work with Jason Root/TAMU to diagnose in their test env, since we cannot repro in the hosted ref envs.

      See more in thread: https://folio-project.slack.com/archives/CA39M62BZ/p1629301167142900

      *From Linda Turney (Middle Tennessee State): *
      Hello! My institution went live in July and I have a question for this group about data import behavior. We are on Iris Hot Fix #3. I'm checking to see if any others have similar experiences with "first load of the day" behavior. Sometimes, not always, the first load will complete with errors and the log will be empty. I run the same job immediately and it completes with no errors. Since I've experienced this many times, I usually make the first data import job very small, just in case I need to do some clean up. Sometimes I experience this later in the day when there is a long pause between jobs. It's almost like I've startled the system awake. Anyway, I appreciate any comments. Thanks!

      From Jason (Texas A&M):
      To clarify: It’s not that the first import of the day is slow - it’s that it completed with errors. And then succeeds on the 2nd attempt.
      For an idea of our env: We’re using these mod-data-import module settings:
      KAFKA_PORT = 9092
      KAFKA_HOST = http://kafka-pre
      JAVA_OPTIONS = -XX:MaxRAMPercentage=66.0
      file.processing.buffer.chunk.size = 50
      ENV = folio-pre
      data.import.storage.type = LOCAL_STORAGE
      data.import.storage.path = /storage/upload
      With an NFS volume mounted inside the module at this path: /usr/verticles/storage/upload (for shared storage between the two instances of the running deployment).

      My suspicion is that there is possibly some cache or shared memory I’m not taking into account on first write, or that Kafka needs some tweaking (buffer chunk size?). I did set up kafka and mod-inv as recommended in the data import guide.

      Jason Root:
      To clarify: It’s not that the first import of the day is slow - it’s that it completed with errors. And then succeeds on the 2nd attempt. (edited)

      Adam Cottle (Skidmore):
      I have encountered several Data Import jobs in which my first attempt fails but a second one succeeds. Same imported MARC file, same job profile both times. I have not noticed, however, that it occurred on my library's first import of the day.

      Ann-Marie Breaux:
      It seems pretty annoying to me to have to be mindful of doing a very small import each morning, and then watching it to see if you'll have to redo it. We should be able to do better than that. @Jason Root MTSU does not have a test env, but I think A&M does, right? If we needed to do some investigation and maybe pull some logs, could I have one of the devs contact you?

      Jason Root:
      Oh sure - we have all the environments to try it with

      Jason Root:
      I tried to do some diag’ing on this issue during our July side-by-side testing with Voyager, but I could not track it down.

      Ann-Marie Breaux:
      Good to know. And I'll be interested to know @Linda Turney @Jason Root @Adam Cottle if you see any difference in this behavior between Iris and Juniper. Lots of infrastructure cleanup in Juniper, and more coming in Kiwi, to decrease jobs getting stuck.

      Jason Root:
      As soon as Juniper drops publicly - I plan on getting it into our infra here at Tamu.

      Jason Root:
      My suspicion is that there is possibly some cache or shared memory I’m not taking into account on first write, or that Kafka needs some tweaking (buffer chunk size?) I did set up kafka and mod-inv as recommended in the data import guide.

      Ann-Marie Breaux:
      I started a bug, but need to add more info to it. I'm going to leave it draft until I get confirmation that it's still happening after y'all upgrade to Juniper. https://issues.folio.org/browse/MODDATAIMP-524

      Ann-Marie Breaux:
      @Jason Root When you mentioned kafka and mod-inv settings, are you referring to this? These are the latest recommendations for Iris Hotfix 3 and Juniper. https://wiki.folio.org/display/FOLIOtips/0-Recommended+Maximum+File+Sizes+and+Configuration

      Jason Root:
      Ah I’ve not seen that one - there’s another one I used: https://wiki.folio.org/display/FOLIJET/Checklist+for+Data-Import+application+setup+on+a+new+environment%3A+Iris (edited)

      Ann-Marie Breaux:
      Ahh - good to know - I'll get with the team and we'll consolidate or make them consistent. I just added the new one yesterday, based on recommendations from PTF after their analysis and some infrastructure work that the devs did in Juniper

      Raegan Wiechert Missouri State):
      I have not noticed this at Missouri State and for the last few days I have been the person doing the first import.

      Jason Root:
      Ok, upped the Kafka log flush interval to 60 mins (it was set to 10), and set 300GB disk space for the brokers, and added the env var kafka.consumer.max.poll.records = 10 to mod-inv deployment. Our log retention is already 24 hrs now.

      Jason Root:
      Sadly, we are still seeing the failures on first import of the day for that user - with the changes made yesterday.

      Jason Root:
      we’ve done some more testing. It would appear that it is the first attempt from the same client, and not the client’s Folio account, that causes the fail on first import. This leads me to believe there is maybe something going on with the load balancer and headers. Could your devs provide any insights on how the client http streams work here in DI?

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                olamshin Olamide Kolawole
                abreaux Ann-Marie Breaux
                Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases