Uploaded image for project: 'mod-circulation'
  1. mod-circulation
  2. CIRC-1783

Very slow check out behavior observed in Nolana Hotfix #1

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • P1
    • Resolution: Done
    • None
    • 23.3.5, 24.0.0
    • None
    • EPAM-Veg Sprint 165, EPAM-Veg Sprint 166, EPAM-Veg Sprint 167, EPAM-Veg Sprint 168
    • 5
    • Vega
    • Nolana (R3 2022) Service Patch #2
    • Yes
    • Hide
      1. Stanford University will be blocked from going live on Nolana HF1 if this isn't addressed.
      2. Stanford University
      3. There is no workaround available.
      4. Checkin will be impacted
      5. First, we’re going to wrap the circ rules parsing code in a vertx method that will spin it off into it’s own thread, effectively making it a non-blocking process. This should remove the thread blocking errors that appear in the logs and speed up processing time for the overall circ operation. The second part is increasing the amount of time that elapses before the circ rules cache has to be reloaded-this means it will be parsed less frequently. Currently, the timeout for rebuilding the cache and re-parsing the rules file is five seconds-our sense is that circ rules update fairly infrequently, so the duration can be increased without noticeable disruption. We’ll also need to adjust tests related to the circulation rules caching functionality in the automated tests to take the new time period into account. This should take three to five days to implement and test. (5 story points)
      6. Because this issue is observable in our current production code, we are planning on testing it by loading a lengthy circulation rules file into one of the development environments (vega scratch), then run circulation operations that require the circ rules (checkin, checkout, requests, etc). We can pretty easily force the circ rules to re-load on every transaction by making slight modifications to the file between transactions. We can observe the delay this causes the operation and also check the logs to see if the thread blocking errors appear. If circulation operations are noticeably faster with the code in place that without it, and the thread blocking errors are not present, we’ve succeeded.
      7. Rollback is tricky, because defining “failure” in this area is somewhat subjective. Whether circulation operations experience a delay because of parsing the rules file is dependent on a number of factors outside of our control, the most salient of which are the length of the circ rules file and the computing resources available to the parser on a given system. Also, circ operations consist of a lot more operations than just parsing the rules file. This fix should produce some noticeable improvement for implementers who are currently experiencing delays due to this specific issue, and will definitely remove thread blocking problems.
      The only reason we could see for rolling this back is if it causes some kind of serious issue, which seems very unlikely.
      Show
      1. Stanford University will be blocked from going live on Nolana HF1 if this isn't addressed. 2. Stanford University 3. There is no workaround available. 4. Checkin will be impacted 5. First, we’re going to wrap the circ rules parsing code in a vertx method that will spin it off into it’s own thread, effectively making it a non-blocking process. This should remove the thread blocking errors that appear in the logs and speed up processing time for the overall circ operation. The second part is increasing the amount of time that elapses before the circ rules cache has to be reloaded-this means it will be parsed less frequently. Currently, the timeout for rebuilding the cache and re-parsing the rules file is five seconds-our sense is that circ rules update fairly infrequently, so the duration can be increased without noticeable disruption. We’ll also need to adjust tests related to the circulation rules caching functionality in the automated tests to take the new time period into account. This should take three to five days to implement and test. (5 story points) 6. Because this issue is observable in our current production code, we are planning on testing it by loading a lengthy circulation rules file into one of the development environments (vega scratch), then run circulation operations that require the circ rules (checkin, checkout, requests, etc). We can pretty easily force the circ rules to re-load on every transaction by making slight modifications to the file between transactions. We can observe the delay this causes the operation and also check the logs to see if the thread blocking errors appear. If circulation operations are noticeably faster with the code in place that without it, and the thread blocking errors are not present, we’ve succeeded. 7. Rollback is tricky, because defining “failure” in this area is somewhat subjective. Whether circulation operations experience a delay because of parsing the rules file is dependent on a number of factors outside of our control, the most salient of which are the length of the circ rules file and the computing resources available to the parser on a given system. Also, circ operations consist of a lot more operations than just parsing the rules file. This fix should produce some noticeable improvement for implementers who are currently experiencing delays due to this specific issue, and will definitely remove thread blocking problems. The only reason we could see for rolling this back is if it causes some kind of serious issue, which seems very unlikely.
    • OTHER
    • Data related (ex. Can be detected with large dataset only)
    • Nolana (R3 2022)

    Description

      Overview:

      Stanford is running Nolana Hotfix #1 with a circ rule file with several hundred lines and seeing very slow checkouts (+20 seconds.) The checkout does complete.

      They are seeing thread blocked errors in mod-circulation like the following:

      mod-circulation-69d9fcd94c-rc54v mod-circulation at org.folio.circulation.rules.cache.CirculationRulesCache$$Lambda$881/0x0000000100488440.apply(Unknown Source) ~[?:?]

      As a troubleshooting step, they removed their circulation rule file and just had the two required lines - priority and fallback. That fixed the behavior. Putting their rules back in place caused the behavior to resume.

      Additional info
      Discussed in sys-ops slack channel - https://folio-project.slack.com/archives/C9BBWRCNB/p1682971587066759

      TestRail: Results

        Attachments

          1. circ_rules.json
            78 kB
          2. fixed_due_date_sched.json
            1 kB
          3. loan_policies.json
            24 kB
          4. lost_item_fees.json
            17 kB
          5. mod-circulation-vega-scratch.log
            524 kB
          6. nolana_circ_log
            2.85 MB
          7. overdue_fines.json
            1 kB
          8. patron_notice_policies.json
            4 kB
          9. patron_notice_templates.json
            35 kB
          10. request_cancellation_reasons.json
            1.0 kB
          11. request_policies.json
            0.4 kB
          12. Terminal Saved Output.txt
            1.23 MB
          13. Terminal Saved Output v2.txt
            1.46 MB

          Issue Links

            Activity

              People

                felkerk Kyle Felker
                enettifee Erin Nettifee
                Votes:
                1 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases