Details
-
Story
-
Status: Closed (View Workflow)
-
P2
-
Resolution: Done
-
None
-
-
CP: sprint 92, CP: sprint 94, CP: sprint 95
-
8
-
Core: Platform
Description
In order to get an idea of how Okapi performs on a detailed level, the following custom metrics will be useful to have added:
- Incoming request rate (in reqs/sec)
- Internal end-to-end processing time (after receiving the request to right before sending the response)
- Any outgoing API call need to have their own set of metrics, which include:
- rate of calling x API or (for certain operations that are suspicious of highly expensive)
- count of calls made to x API.
- response time - the time measured right before sending off the request to finish receiving the last bit of the response.
For example, if Okapi calls to mod-authtoken, it'd be great to know the rate of calling it, the number of calls, and the time that it waits to get the response back.
- Count of errors
Notes
The metrics reported for incoming (proxied) calls and outgoing calls should include enough metadata to allow categorize and group them, including:
- moduleId e.g mod-authtoken-1.2.3, unless it makes sense to tag with moduleName and moduleVersion seperately
- uri the "path" of the HTTP call
- queryString to capture the parameters
- whether it's an incoming (proxy) calls or outgoing (system) call that Okapi makes
- phase for the "filter" e.g 'auth'
Metrics reported should be easy to integrate with InfluxDB, The proposal is to use https://vertx.io/docs/vertx-micrometer-metrics/java/
Micrometer concepts: https://micrometer.io/docs/concepts
RED monitoring method https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-architecture/
Design and implementation
Following meters (and tags) are defined and implemented for Okapi HTTP proxy calls. They have the same Vert.x core tools metrics naming convention. Note the dots in the name are converted to _ in InfluxDB automatically. There is no error meter defined for server side because that is covered by server processingTime meter.
- org.folio.okapi.http.server.processingTime - metrics type is Micrometer Timer. Tags are below
- host - Okapi instance. An example value: ip-172-31-19-200.ec2.internal/172.31.19.200
- tenant - FOLIO tenant id. An example value: diku
- code - HTTP response code. An example value: 200
- method - HTTP request method. An example value: GET
- module - FOLIO module id. An example value: mod-authtoken-2.6.0-SNAPSHOT.73
- url - HTTP request url as defined in module descriptor. An example value: /users/ {id}
- org.folio.okapi.http.client.responseTime - metrics type is Micrometer Timer. Tags are below
- all tags defined above plus
- phase - FOLIO proxy phase. An example value: auth. The default value is handler
- org.folio.okapi.http.client.errors - metrics type is Micrometer Counter. Tags are below
- tags: host, tenant, method, and url
TestRail: Results
Attachments
Issue Links
- relates to
-
OKAPI-864 Remove dropwizard metrics
-
- Closed
-
-
OKAPI-867 Add DB-related metrics
-
- Closed
-
-
OKAPI-868 SPIKE: identify additional blocks of code for instrumentation
-
- Closed
-
-
PERF-115 Set up Grafana board to display Okapi metrics
-
- Closed
-
-
RMB-655 Add default metrics to RMB: Outgoing API calls
-
- Closed
-
-
RMB-668 Add default metrics to RMB: DB calls
-
- Closed
-
-
RMB-669 Add default metrics to RMB: incoming API calls
-
- Closed
-
-
MODAT-80 SPIKE: investigate authentication performance optimizations
-
- Closed
-
-
MODAT-83 SPIKE: investigate authentication performance optimizations continued.
-
- Closed
-