Uploaded image for project: 'UX Product'
  1. UX Product
  2. UXPROD-3193

NFR: R3 2021 Kiwi Data import Stability/Reliability work



    • Low
    • Jumbo: > 45 days
    • Folijet
    • 117
    • R1
    • R1


      Team estimation - 90 days

      UXPROD-3135 was split into UXPROD-3193 for stability and reliability and UXPROD-3191 for performance; abreaux to close UXPROD-3135 once all issues moved from it to the new features

      Current situation or problem:
      1.High CPU/Memory consumption on modules

      2.Duplicates created upon import
      3. SRS can fail when processing message during import

      4. If we have infrastructure issue (like DB not available, module being restarted or network failure), we are sending DI_ERROR instead of retrying
      5. De-duplication of status messages for progress bar

      Investigation required for:

      6. Race condition on start (Kafka consumers start working before DB is configured) OR Periodical DB shutdown after SRS restart. Jobs get stuck if not able to update status in DB (messages ACKed even if we could not process them)
      7.Kafka consumers stop reading messages eventually, breaking job progress until module restart.
      8.mod-data-import stores input file in memory, limiting size of uploaded file and possibly having oom
      9.Consumer gets disconnected from Kafka cluster

      In scope

      Out of scope

      Use case(s)

      Proposed solution/stories
      1*.*Significantly decrease size of payload:

      1. Remove immutable parts. Instead fetch them on demand and cache locally for reuse.
      2. Change message handling mechanism (currently relies on pt1 - profile) (optional)
      3. Move archiving to Kafka instead of module level

      2.Make consumers behave idempotent. Add pass-through identifier to de-duplicate messages. 
      3.Generate "INSTANCE CREATED" from mod-inventory. Consume in SRS to update HRID in BIB and in INVENTORY to continue processing.

      4.Do not ACK messages in Kafka if there's not a logic, but infrastructure error/exception. Split failed processing results into 2 categories:

      1. IO errors - do not ack. retry until fixed
      2. Business logic - DI_ERROR and Ack current message

      Remove unnecessary topics (* ready for post processing and hrid set)

      5.De-duplicate status messages per-record while tracking progress

      Problems 6,7,8 and 9 require investigation
      Possible solution for problem 8 -  Split to chunks, put to database, work with database/temp storage. Partially done (to be investigated)

      Links to additional info:
      Data Import Stabilization plan - Vladimir Shalaev - FOLIO Wiki


      TestRail: Results


          Issue Links



                abreaux Ann-Marie Breaux
                Taisiya Trunova Taisiya Trunova (Inactive)
                0 Vote for this issue
                3 Start watching this issue



                  TestRail: Runs

                    TestRail: Cases