Uploaded image for project: 'mod-circulation'
  1. mod-circulation
  2. CIRC-1835

Very slow check out behavior: Orchid CSP 3

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • P1
    • Resolution: Done
    • None
    • 23.5.5
    • None
    • EPAM-Veg Sprint 169
    • 5
    • Vega
    • Orchid (R1 2023) Service Patch #3
    • Yes
    • Hide
      1. Stanford University will be blocked from going live on Nolana HF1 if this isn't addressed.
      2. Stanford University
      3. There is no workaround available.
      4. Checkin will be impacted
      5. To address this, the team made the decision to decouple circulation rules building and parsing from circulation operations as much as possible. Instead of having the code check the age of the cache every time a circ operation occurs, we established a timer endpoint, similar to the ones for operations like aged-to-lost. The timer will rebuild the rules every three minutes by default, and can be customized by users to run as seldom or often as they like. Because the rules are now rebuilt automatically by a separate process, the rules cache code no longer needs to care about how old the cache is or when it was last rebuilt-it just takes whatever is there if the rules have been built, and builds them if they are not present. In addition, we now purge and rebuild the rules cache every time the circ rules are modified. This is necessary for the automated tests in the module to work correctly, and also makes sense for deployed code, as you would want the rules used to be refreshed if you’ve modified them. This should make circ operations speedier, as they now only rebuild the rules if they are not present in the cache, which should only happen if the system has been re-initialized and the timer has not had a chance to run yet. At all other times, it should simply use whatever is in the cache.
      6. Because this issue is observable in our current production code, we are planning on testing it by loading a lengthy circulation rules file into one of the development environments (vega scratch), then run circulation operations that require the circ rules (checkin, checkout, requests, etc). We can pretty easily force the circ rules to re-load on every transaction by making slight modifications to the file between transactions. We can observe the delay this causes the operation and also check the logs to see if the thread blocking errors appear. If circulation operations are noticeably faster with the code in place that without it, and the thread blocking errors are not present, we’ve succeeded.
      7. Rollback is tricky, because defining “failure” in this area is somewhat subjective. Whether circulation operations experience a delay because of parsing the rules file is dependent on a number of factors outside of our control, the most salient of which are the length of the circ rules file and the computing resources available to the parser on a given system. Also, circ operations consist of a lot more operations than just parsing the rules file. This fix should produce some noticeable improvement for implementers who are currently experiencing delays due to this specific issue, and will definitely remove thread blocking problems.
      The only reason we could see for rolling this back is if it causes some kind of serious issue, which seems very unlikely.
      Show
      1. Stanford University will be blocked from going live on Nolana HF1 if this isn't addressed. 2. Stanford University 3. There is no workaround available. 4. Checkin will be impacted 5. To address this, the team made the decision to decouple circulation rules building and parsing from circulation operations as much as possible. Instead of having the code check the age of the cache every time a circ operation occurs, we established a timer endpoint, similar to the ones for operations like aged-to-lost. The timer will rebuild the rules every three minutes by default, and can be customized by users to run as seldom or often as they like. Because the rules are now rebuilt automatically by a separate process, the rules cache code no longer needs to care about how old the cache is or when it was last rebuilt-it just takes whatever is there if the rules have been built, and builds them if they are not present. In addition, we now purge and rebuild the rules cache every time the circ rules are modified. This is necessary for the automated tests in the module to work correctly, and also makes sense for deployed code, as you would want the rules used to be refreshed if you’ve modified them. This should make circ operations speedier, as they now only rebuild the rules if they are not present in the cache, which should only happen if the system has been re-initialized and the timer has not had a chance to run yet. At all other times, it should simply use whatever is in the cache. 6. Because this issue is observable in our current production code, we are planning on testing it by loading a lengthy circulation rules file into one of the development environments (vega scratch), then run circulation operations that require the circ rules (checkin, checkout, requests, etc). We can pretty easily force the circ rules to re-load on every transaction by making slight modifications to the file between transactions. We can observe the delay this causes the operation and also check the logs to see if the thread blocking errors appear. If circulation operations are noticeably faster with the code in place that without it, and the thread blocking errors are not present, we’ve succeeded. 7. Rollback is tricky, because defining “failure” in this area is somewhat subjective. Whether circulation operations experience a delay because of parsing the rules file is dependent on a number of factors outside of our control, the most salient of which are the length of the circ rules file and the computing resources available to the parser on a given system. Also, circ operations consist of a lot more operations than just parsing the rules file. This fix should produce some noticeable improvement for implementers who are currently experiencing delays due to this specific issue, and will definitely remove thread blocking problems. The only reason we could see for rolling this back is if it causes some kind of serious issue, which seems very unlikely.
    • OTHER
    • Data related (ex. Can be detected with large dataset only)
    • Nolana (R3 2022)

    Description

      Overview:

      Stanford is running Nolana Hotfix #1 with a circ rule file with several hundred lines and seeing very slow checkouts (+20 seconds.) The checkout does complete.

      They are seeing thread blocked errors in mod-circulation like the following:

      mod-circulation-69d9fcd94c-rc54v mod-circulation at org.folio.circulation.rules.cache.CirculationRulesCache$$Lambda$881/0x0000000100488440.apply(Unknown Source) ~[?:?]

      As a troubleshooting step, they removed their circulation rule file and just had the two required lines - priority and fallback. That fixed the behavior. Putting their rules back in place caused the behavior to resume.

      Additional info
      Discussed in sys-ops slack channel - https://folio-project.slack.com/archives/C9BBWRCNB/p1682971587066759

      TestRail: Results

        Attachments

          1. circ_rules.json
            78 kB
          2. fixed_due_date_sched.json
            1 kB
          3. loan_policies.json
            24 kB
          4. lost_item_fees.json
            17 kB
          5. mod-circulation-vega-scratch.log
            524 kB
          6. nolana_circ_log
            2.85 MB
          7. overdue_fines.json
            1 kB
          8. patron_notice_policies.json
            4 kB
          9. patron_notice_templates.json
            35 kB
          10. request_cancellation_reasons.json
            1.0 kB
          11. request_policies.json
            0.4 kB
          12. Terminal Saved Output.txt
            1.23 MB
          13. Terminal Saved Output v2.txt
            1.46 MB

          Issue Links

            Activity

              People

                felkerk Kyle Felker
                stephaniesbuck Stephanie Buck
                Votes:
                0 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases