Uploaded image for project: 'Okapi'
  1. Okapi
  2. OKAPI-984

Retry when slow module startup causes "connection refused"

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: TBD
    • Resolution: Done
    • Affects Version/s: 4.4.2, 4.6.3
    • Fix Version/s: 4.6.4, 4.7.0
    • Labels:
      None
    • Template:
      Standard Bug Write-Up Format
    • Sprint:
      CP: sprint 107
    • Story Points:
      2
    • Development Team:
      Core: Platform
    • Affected Institution:
      GBV

      Description

      Upgrading Okapi fails, for example "Tenant diku moving from okapi-4.3.3 to okapi-4.4.2" or "Tenant diku moving from okapi-4.6.2 to okapi-4.6.3".

      How to reproduce

      Install FOLIO Honeysuckle platform-complete as single server: https://github.com/folio-org/folio-install/tree/master/runbooks/single-server with Okapi 4.6.2-1 on an Ubuntu machine.

      After the installation has completed and runs successfully:

      Upgrade Okapi from 4.6.2-1 to 4.6.3-1 using apt or aptitude.

      Restart Okapi:

      sudo systemctl daemon-reload
      sudo systemctl restart okapi

      I have a single server installation with 100 modules on hardware with only a few cpu cores.
      Okapi starts mod-permissions but it takes 7 minutes until mod-permissions can serve requests:

      exec java -XX:MaxRAMPercentage=66.0 -XX:+ExitOnOutOfMemoryError -cp . -jar /usr/verticles/mod-permissions-fat.jar
      starting rest verticle service..........
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      10 Feb 2021 16:34:52:533 INFO  Version [] HV000001: Hibernate Validator null
      10 Feb 2021 16:35:23:750 INFO  Messages [] Loading messages from /infra-messages/APIMessages_en.properties ................................
      10 Feb 2021 16:35:23:897 INFO  Messages [] Loading messages from /infra-messages/APIMessages_de.properties ................................
      10 Feb 2021 16:35:25:293 INFO  RestVerticle [] git: https://github.com/folio-org/raml-module-builder.git 8b8b856d91d9016612f41765bbdef5bf25de1f7d
      10 Feb 2021 16:35:26:817 INFO  PomReader [] Reading from jar
      10 Feb 2021 16:35:31:683 INFO  PomReader [] module name: mod_permissions, version: 5.12.2
      10 Feb 2021 16:35:32:319 INFO  LogUtil [] org.folio.rest.RestVerticle start metrics enabled: false
      10 Feb 2021 16:36:11:499 INFO  RestVerticle [] ^/_/tenantpermissions
      10 Feb 2021 16:36:12:426 INFO  RestVerticle [] ^/_/jsonSchemas
      10 Feb 2021 16:36:12:608 INFO  RestVerticle [] ^/_/tenant
      10 Feb 2021 16:36:12:666 INFO  RestVerticle [] ^/rmbtests
      10 Feb 2021 16:36:12:692 INFO  RestVerticle [] ^/perms
      10 Feb 2021 16:36:13:208 INFO  RestVerticle [] ^/_/ramls
      10 Feb 2021 16:36:13:251 INFO  RestVerticle [] ^/admin
      10 Feb 2021 16:36:14:713 INFO  RestVerticle [] 1 verticles deployed
      10 Feb 2021 16:36:33:961 INFO  RestVerticle [] init succeeded.......
      10 Feb 2021 16:36:43:687 INFO  LogUtil [] org.folio.rest.RestVerticle runPeriodicHook no periodic implementation found, continuing with deployment
      10 Feb 2021 16:37:07:157 INFO  LogUtil [] org.folio.rest.RestVerticle runPostDeployHook no Post Deploy Hook implementation found, continuing with deployment
      10 Feb 2021 16:37:07:372 INFO  LogUtil [] org.folio.rest.RestVerticle start http server for apis and docs started on port 8081.
      10 Feb 2021 16:37:07:427 INFO  LogUtil [] org.folio.rest.RestVerticle start Documentation available at: http://localhost:8081/apidocs/
      10 Feb 2021 16:37:07:641 INFO  VertxIsolatedDeployer [] Succeeded in deploying verticle
      

      When Okapi calls /_/tenantpermissions it gets "Connection refused" and shuts down:

      2021-02-10T17:32:25,974 INFO  TenantManager        Tenant diku moving from okapi-4.3.3 to okapi-4.4.2
      2021-02-10T17:32:26,070 INFO  OkapiClient          564636/tenantpermissions REQ okapiClient diku POST http://172.17.0.1:9177/_/tenantpermissions
      2021-02-10T17:32:26,160 WARN  ProxyService         POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
      2021-02-10T17:32:26,162 ERROR MainVerticle         POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
      2021-02-10T17:32:26,284 ERROR MainCluster          POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
      org.folio.okapi.util.OkapiError: POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
              at org.folio.okapi.managers.ProxyService.lambda$doCallSystemInterface$44(ProxyService.java:1231) ~[okapi-core-fat.jar:?]
              at org.folio.okapi.common.OkapiClient.lambda$request$1(OkapiClient.java:198) ~[okapi-core-fat.jar:?]
              at org.folio.okapi.common.OkapiClient.lambda$request1$2(OkapiClient.java:227) ~[okapi-core-fat.jar:?]
      2021-02-10T17:32:26,291 INFO  DeploymentManager    shutdown
      

      10 Feb 2021 16:3x = 2021-02-10T17:3x because CET time zone.

      I called systemctl start okapi at 10 Feb 2021 16:30 = 2021-02-10T17:30.

      The same for Okapi 4.6.3:

      2021-02-10T19:18:20,278 INFO  TenantManager        Tenant diku moving from okapi-4.3.3 to okapi-4.6.3
      2021-02-10T19:18:20,408 INFO  OkapiClient          121016/tenantpermissions REQ okapiClient diku POST http://172.17.0.1:9177/_/tenantpermissions
      2021-02-10T19:18:20,496 WARN  ProxyService         POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
      2021-02-10T19:18:20,500 ERROR MainVerticle         POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
      2021-02-10T19:18:20,729 ERROR MainCluster          POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
      org.folio.okapi.util.OkapiError: POST request for mod-permissions-5.12.2 /_/tenantpermissions failed with finishConnect(..) failed: Connection refused: /172.17.0.1:9177
              at org.folio.okapi.managers.ProxyService.lambda$doCallSystemInterface2$44(ProxyService.java:1220) ~[okapi-core-fat.jar:?]
              at org.folio.okapi.common.OkapiClient.lambda$request$1(OkapiClient.java:198) ~[okapi-core-fat.jar:?]
              at org.folio.okapi.common.OkapiClient.lambda$request1$2(OkapiClient.java:227) ~[okapi-core-fat.jar:?]
      2021-02-10T19:18:20,780 INFO  DeploymentManager    shutdown
      

      When Okapi gets "connection refused" from a module it should retry until some timeout is reached. A possible timeout might be 1000 seconds = 16.6 minutes.
      On my installation it takes 7 minutes to start the 100 modules. 7 minutes = 7*60 seconds = 420 seconds. This is 4.2 seconds on average per module (assuming a single cpu).

        TestRail: Results

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                adam Adam Dickmeiss
                Reporter:
                julianladisch Julian Ladisch
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    TestRail: Runs

                      TestRail: Cases