Data Import's performance, and that of mod-inventory, seems to be affected by CPU spikes in mod-inventory. This happened during and even well after the import completed. This resulted in a slower import. Here are the observations
- mod-inventory's CPU spiking during and even for several hours after the import.
A data import took place between the blue lines, between 22:00 and 00:48. Spikes after that are abnormal.
- Most of the spikes correspond to the events_cache's rate of incoming message of 1K or more per second.
Graph corresponds to the same import noted above. Note that in this instance there were no events_cache spikes after 00:48
- One of the Kafka brokers' CPU also spiked at the same time.
Kafka brokers CPU graph corresponds to the import noted above. Note that the broker's CPU spikes match up to mod-inventory's spikes
- Prometheus's data appears to have gaps. The two graphs show gaps in data collection, and that toward the end of the gaps was when the events_cache spikes happened. Was the broker's performance so bad that data had gaps. Interestingly, in the Error graph, the spikes in errors happened at the same time. This is the metric for when the brokers "can't write" (not sure to where), according to AWS.
- Note the lulls in all DI topics before the spikes (area between the blue drawn lines) . This is when mod-inventory's threads are being blocked.
- Messages that were logged during the 'lull' in activities before the spikes in mod-inventory and events_cache. A full message is:
23:39:23     WARN ? Thread Thread[vert.x-worker-thread-8,5,main] has been blocked for 823487 ms, time limit is 60000 ms
- This graph shows the brokers' CPU at the time of the lull, notably broker 2 spiked up at that time.
- Attached is broker 2's logs. 0629-2325-2340-broker2-lull.csv
- Attached are a couple of exceptions being logged after the lull, during and around the peak. ThreadsBlockedExceptions.txt
The goal of this JIRA is to understand why there are CPU spikes and how they affect DI's performance, and finally come up with a fix for it.
Steps to Reproduce:
This happens randomly. Sometimes during an import there are no spikes at all, sometimes there are multiple spikes. Many times there are several spikes that happened periodically after the import is completed.
mod-inventory's logs can be provided.
Interested parties: abreaux