Details
-
Bug
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Done
-
None
-
None
-
Lotus-hotfix-1
mod-source-record-manager-3.3.8
Memory - 3600MB/3240MB hard/soft memory limit
-
5
-
Folijet Support
-
Morning Glory (R2 2022) Bug Fix
-
MI State University/Library of Michigan
-
Data related (ex. Can be detected with large dataset only)
Description
Overview:
mod-source-record-manager is failing with Out Of Memory error in idle state when no Data Import job is running.
Last week of memory usage graph
Container memory before the crash
After investigating further and taking a heap dump, I discovered that there are no memory leaks, but from the Thread Stack, it runs 200+ threads. That's a lot of threads, especially when Data Import is in an idle state. I checked for the last 4 hours before taking the heap dump, no Data Import jobs were running. There were few Data Import jobs run in past before it went into an idle state. I think that's where those 200+ threads were initially created, and it looks like they never terminated.
Please see attached list of threads running in idle - List-of-threads-running-idle.txt This Thread Stack is from JXRay Report. Here is the complete report mod-srm-lmch-memory-dump-2.html
Question: Are all 201 threads running doing useful work?
Please see attached Thread dump report - Thread_Dump_HPROF_snapshot_mod-srm-lmch_hprof.html From the Thread Dump report, multiple copies of a single thread are running, is that expected?
For example:
Thread kafka-coordinator-heartbeat-thread | DI_INVENTORY_AUTHORITY_UPDATED.mod-source-record-manager-3.3.8_DataImportJournalConsumersVerticle: at jdk.internal.misc.Unsafe.park(boolean, long) at java.util.concurrent.locks.LockSupport.park(java.lang.Object) (line: 194) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() (line: 885) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) (line: 917) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) (line: 1240) at java.util.concurrent.locks.ReentrantLock.lock() (line: 267) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(org.apache.kafka.common.utils.Timer, org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$PollCondition, boolean) (line: 249) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup() (line: 306) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run() (line: 1386) Thread kafka-coordinator-heartbeat-thread | DI_INVENTORY_AUTHORITY_UPDATED.mod-source-record-manager-3.3.8_DataImportJournalConsumersVerticle: at java.lang.Object.wait(long) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run() (line: 1418) Thread kafka-coordinator-heartbeat-thread | DI_INVENTORY_AUTHORITY_UPDATED.mod-source-record-manager-3.3.8_DataImportJournalConsumersVerticle: at java.lang.Object.wait(long) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run() (line: 1418)
The report hints at another problem. DirectByteBuffers consume more memory, and I think it is because all those threads running consume memory to do IO.
Complete Heap dump is around 1GB and can be found - mod-srm-memory-dump-1.hprof
Steps to Reproduce:
This problem can be reproduced in Lotus hotfix-1. After running multiple runs of Data Import, when the Apps go into an idle state, the threads are still running, consuming resources.
Expected Results:
mod-source-record-manager should not crash with Out Of Memory Error
Actual Results:
mod-source-record-manager is crashing with Out Of Memory Error
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-3464 NFR: Data Import R2 2022 Morning Glory Support Bug work
-
- Closed
-
- mentioned in
-
Page Loading...