Reported by 2 libraries:
Also may be that an import sometimes does not finish on try 1, but (same file, same job profile) finishes on job 2.
See if if is better when the library is on Juniper before moving this from draft to open, plus put into standard bug format and add description and repro steps.
Can work with Jason Root/TAMU to diagnose in their test env, since we cannot repro in the hosted ref envs.
See more in thread: https://folio-project.slack.com/archives/CA39M62BZ/p1629301167142900
*From Linda Turney (Middle Tennessee State): *
Hello! My institution went live in July and I have a question for this group about data import behavior. We are on Iris Hot Fix #3. I'm checking to see if any others have similar experiences with "first load of the day" behavior. Sometimes, not always, the first load will complete with errors and the log will be empty. I run the same job immediately and it completes with no errors. Since I've experienced this many times, I usually make the first data import job very small, just in case I need to do some clean up. Sometimes I experience this later in the day when there is a long pause between jobs. It's almost like I've startled the system awake. Anyway, I appreciate any comments. Thanks!
From Jason (Texas A&M):
To clarify: It’s not that the first import of the day is slow - it’s that it completed with errors. And then succeeds on the 2nd attempt.
For an idea of our env: We’re using these mod-data-import module settings:
KAFKA_PORT = 9092
KAFKA_HOST = http://kafka-pre
JAVA_OPTIONS = -XX:MaxRAMPercentage=66.0
file.processing.buffer.chunk.size = 50
ENV = folio-pre
data.import.storage.type = LOCAL_STORAGE
data.import.storage.path = /storage/upload
With an NFS volume mounted inside the module at this path: /usr/verticles/storage/upload (for shared storage between the two instances of the running deployment).
My suspicion is that there is possibly some cache or shared memory I’m not taking into account on first write, or that Kafka needs some tweaking (buffer chunk size?). I did set up kafka and mod-inv as recommended in the data import guide.
To clarify: It’s not that the first import of the day is slow - it’s that it completed with errors. And then succeeds on the 2nd attempt. (edited)
Adam Cottle (Skidmore):
I have encountered several Data Import jobs in which my first attempt fails but a second one succeeds. Same imported MARC file, same job profile both times. I have not noticed, however, that it occurred on my library's first import of the day.
It seems pretty annoying to me to have to be mindful of doing a very small import each morning, and then watching it to see if you'll have to redo it. We should be able to do better than that. @Jason Root MTSU does not have a test env, but I think A&M does, right? If we needed to do some investigation and maybe pull some logs, could I have one of the devs contact you?
Oh sure - we have all the environments to try it with
I tried to do some diag’ing on this issue during our July side-by-side testing with Voyager, but I could not track it down.
Good to know. And I'll be interested to know @Linda Turney @Jason Root @Adam Cottle if you see any difference in this behavior between Iris and Juniper. Lots of infrastructure cleanup in Juniper, and more coming in Kiwi, to decrease jobs getting stuck.
As soon as Juniper drops publicly - I plan on getting it into our infra here at Tamu.
My suspicion is that there is possibly some cache or shared memory I’m not taking into account on first write, or that Kafka needs some tweaking (buffer chunk size?) I did set up kafka and mod-inv as recommended in the data import guide.
I started a bug, but need to add more info to it. I'm going to leave it draft until I get confirmation that it's still happening after y'all upgrade to Juniper. https://issues.folio.org/browse/MODDATAIMP-524
@Jason Root When you mentioned kafka and mod-inv settings, are you referring to this? These are the latest recommendations for Iris Hotfix 3 and Juniper. https://wiki.folio.org/display/FOLIOtips/0-Recommended+Maximum+File+Sizes+and+Configuration
Ah I’ve not seen that one - there’s another one I used: https://wiki.folio.org/display/FOLIJET/Checklist+for+Data-Import+application+setup+on+a+new+environment%3A+Iris (edited)
Ahh - good to know - I'll get with the team and we'll consolidate or make them consistent. I just added the new one yesterday, based on recommendations from PTF after their analysis and some infrastructure work that the devs did in Juniper
Raegan Wiechert Missouri State):
I have not noticed this at Missouri State and for the last few days I have been the person doing the first import.
Ok, upped the Kafka log flush interval to 60 mins (it was set to 10), and set 300GB disk space for the brokers, and added the env var kafka.consumer.max.poll.records = 10 to mod-inv deployment. Our log retention is already 24 hrs now.
Sadly, we are still seeing the failures on first import of the day for that user - with the changes made yesterday.
we’ve done some more testing. It would appear that it is the first attempt from the same client, and not the client’s Folio account, that causes the fail on first import. This leads me to believe there is maybe something going on with the load balancer and headers. Could your devs provide any insights on how the client http streams work here in DI?