Details
-
Bug
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Duplicate
-
None
-
None
-
0
-
Folijet Support
-
Lotus R1 2022
-
Cornell
-
Not a bug
Description
Overview:
Data Import's performance, and that of mod-inventory, seems to be affected by CPU spikes in mod-inventory. This happened during and even well after the import completed. This resulted in a slower import. Here are the observations
- mod-inventory's CPU spiking during and even for several hours after the import.
A data import took place between the blue lines, between 22:00 and 00:48. Spikes after that are abnormal.
- Most of the spikes correspond to the events_cache's rate of incoming message of 1K or more per second.
Graph corresponds to the same import noted above. Note that in this instance there were no events_cache spikes after 00:48
- One of the Kafka brokers' CPU also spiked at the same time.
Kafka brokers CPU graph corresponds to the import noted above. Note that the broker's CPU spikes match up to mod-inventory's spikes
- Prometheus's data appears to have gaps.
The two graphs show gaps in data collection, and that toward the end of the gaps was when the events_cache spikes happened. Was the broker's performance so bad that data had gaps. Interestingly, in the Error graph, the spikes in errors happened at the same time. This is the metric for when the brokers "can't write" (not sure to where), according to AWS.
- Note the lulls in all DI topics before the spikes (area between the blue drawn lines)
. This is when mod-inventory's threads are being blocked.
Messages that were logged during the 'lull' in activities before the spikes in mod-inventory and events_cache. A full message is:
23:39:23 [] [] [] [] WARN ? Thread Thread[vert.x-worker-thread-8,5,main] has been blocked for 823487 ms, time limit is 60000 msThis graph shows the brokers' CPU at the time of the lull, notably broker 2 spiked up at that time.
- Attached is broker 2's logs. 0629-2325-2340-broker2-lull.csv
- Attached are a couple of exceptions being logged after the lull, during and around the peak. ThreadsBlockedExceptions.txt
The goal of this JIRA is to understand why there are CPU spikes and how they affect DI's performance, and finally come up with a fix for it.
Steps to Reproduce:
This happens randomly. Sometimes during an import there are no spikes at all, sometimes there are multiple spikes. Many times there are several spikes that happened periodically after the import is completed.
Additional Information:
mod-inventory's logs can be provided.
Interested parties: abreaux
TestRail: Results
Attachments
Issue Links
- defines
-
UXPROD-3261 NFR: R1 2022 Lotus Data import performance work
-
- Closed
-
- duplicates
-
MODDATAIMP-588 Spike: Investigate possibility of removing Kafka cache
-
- Closed
-
- relates to
-
MODDATAIMP-474 SPIKE: Review PTF job that created more Inventory records than were in the file & fix
-
- Closed
-
-
MODDATAIMP-541 SPIKE: investigate causes of inconsistent imports on PTF Juniper
-
- Closed
-
-
MODINV-401 Spike: KafkaCache Memory not released (after a long time)
-
- Closed
-
- mentioned in
-
Page Loading...