Uploaded image for project: 'mod-source-record-manager'
  1. mod-source-record-manager
  2. MODSOURMAN-851

mod-source-record-manager failing with OOM on Lotus-hotfix-1 (MG Bugfix)

    XMLWordPrintable

Details

    • 5
    • Folijet Support
    • Morning Glory R2 2022 Bug Fix
    • MI State University/Library of Michigan
    • Data related (ex. Can be detected with large dataset only)

    Description

      Overview:

      mod-source-record-manager is failing with Out Of Memory error in idle state when no Data Import job is running.

      Last week of memory usage graph

      Container memory before the crash

       

      After investigating further and taking a heap dump, I discovered that there are no memory leaks, but from the Thread Stack, it runs 200+ threads. That's a lot of threads, especially when Data Import is in an idle state. I checked for the last 4 hours before taking the heap dump, no Data Import jobs were running. There were few Data Import jobs run in past before it went into an idle state. I think that's where those 200+ threads were initially created, and it looks like they never terminated.
      Please see attached list of threads running in idle - List-of-threads-running-idle.txt This Thread Stack is from JXRay Report. Here is the complete report mod-srm-lmch-memory-dump-2.html
      Question: Are all 201 threads running doing useful work?

       
      Please see attached Thread dump report - Thread_Dump_HPROF_snapshot_mod-srm-lmch_hprof.html From the Thread Dump report, multiple copies of a single thread are running, is that expected?
      For example:

      Thread kafka-coordinator-heartbeat-thread | DI_INVENTORY_AUTHORITY_UPDATED.mod-source-record-manager-3.3.8_DataImportJournalConsumersVerticle:
        at jdk.internal.misc.Unsafe.park(boolean, long)
        at java.util.concurrent.locks.LockSupport.park(java.lang.Object) (line: 194)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() (line: 885)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) (line: 917)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) (line: 1240)
        at java.util.concurrent.locks.ReentrantLock.lock() (line: 267)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(org.apache.kafka.common.utils.Timer, org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$PollCondition, boolean) (line: 249)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup() (line: 306)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run() (line: 1386)
      
      Thread kafka-coordinator-heartbeat-thread | DI_INVENTORY_AUTHORITY_UPDATED.mod-source-record-manager-3.3.8_DataImportJournalConsumersVerticle:
        at java.lang.Object.wait(long)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run() (line: 1418)
      
      Thread kafka-coordinator-heartbeat-thread | DI_INVENTORY_AUTHORITY_UPDATED.mod-source-record-manager-3.3.8_DataImportJournalConsumersVerticle:
        at java.lang.Object.wait(long)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run() (line: 1418)
      

      The report hints at another problem. DirectByteBuffers consume more memory, and I think it is because all those threads running consume memory to do IO.

      Complete Heap dump is around 1GB and can be found - mod-srm-memory-dump-1.hprof

      Steps to Reproduce:
      This problem can be reproduced in Lotus hotfix-1. After running multiple runs of Data Import, when the Apps go into an idle state, the threads are still running, consuming resources.

      Expected Results:
      mod-source-record-manager should not crash with Out Of Memory Error

      Actual Results:
      mod-source-record-manager is crashing with Out Of Memory Error

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                Unassigned Unassigned
                varunjavalkar Varun Javalkar
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases