KS: Investigate possibility of removing Kafka cache. Modules that do not do persistent changes will sometimes (on duplicates read) do unnecessary calls. Can be optimized further upon adding distributed in-memory cache (ex hazelcast)
When running consecutive imports and/or running a long import, 10 io.kcache.KafkaCache instances seem to accumulate memory along the way, eventually causes the container to crash. In a better outcome, these 10 objects seem to release the memory on its own but not after a very long or unpredictable time. This object contains the KafkaTopicReader object, which is a major contributor to the overall size of KafkaCache. Investigate how to enable garbage collection on these objects timely so that it would not eat up all the memory.
Note that data import behaves unpredictably when the container crashes. Sometimes the job gets stuck, sometimes the job continues on as if nothing has happened, sometimes a few to hundred of records not getting created. So, it's essential that this problem be figured out.
Steps to Reproduce:
Run repeated data-import jobs of 5K MARC records or more and watch mod-inventory's memory consumption increases.
Memory consumed and released in a timely manner such that the container would not run out of memory in a 24 hour time span.
As described - Objects retained memory for a long time before releasing it.
In this diagram mod-inventory's container crashed twice during the imports run. The crash events are indicated by the letter "C". But at 8:00, mod-inventory released the memory on its own, indicated by the letter "R".
Contact Roman Fedynyshyn for heap dumps