Uploaded image for project: 'UX Product'
  1. UX Product
  2. UXPROD-3193

NFR: R3 2021 Kiwi Data import Stability/Reliability work

    XMLWordPrintable

Details

    • Low
    • Jumbo: > 45 days
    • Folijet
    • 117
    • R1
    • R1

    Description

      Team estimation - 90 days

      UXPROD-3135 was split into UXPROD-3193 for stability and reliability and UXPROD-3191 for performance; abreaux to close UXPROD-3135 once all issues moved from it to the new features

      Current situation or problem:
      1.High CPU/Memory consumption on modules

      2.Duplicates created upon import
      3. SRS can fail when processing message during import

      4. If we have infrastructure issue (like DB not available, module being restarted or network failure), we are sending DI_ERROR instead of retrying
      5. De-duplication of status messages for progress bar

      Investigation required for:

      6. Race condition on start (Kafka consumers start working before DB is configured) OR Periodical DB shutdown after SRS restart. Jobs get stuck if not able to update status in DB (messages ACKed even if we could not process them)
      7.Kafka consumers stop reading messages eventually, breaking job progress until module restart.
      8.mod-data-import stores input file in memory, limiting size of uploaded file and possibly having oom
      9.Consumer gets disconnected from Kafka cluster

      In scope

      Out of scope

      Use case(s)

      Proposed solution/stories
      1*.*Significantly decrease size of payload:

      1. Remove immutable parts. Instead fetch them on demand and cache locally for reuse.
      2. Change message handling mechanism (currently relies on pt1 - profile) (optional)
      3. Move archiving to Kafka instead of module level

      2.Make consumers behave idempotent. Add pass-through identifier to de-duplicate messages. 
      3.Generate "INSTANCE CREATED" from mod-inventory. Consume in SRS to update HRID in BIB and in INVENTORY to continue processing.

      4.Do not ACK messages in Kafka if there's not a logic, but infrastructure error/exception. Split failed processing results into 2 categories:

      1. IO errors - do not ack. retry until fixed
      2. Business logic - DI_ERROR and Ack current message

      Remove unnecessary topics (* ready for post processing and hrid set)

      5.De-duplicate status messages per-record while tracking progress

      Problems 6,7,8 and 9 require investigation
      Possible solution for problem 8 -  Split to chunks, put to database, work with database/temp storage. Partially done (to be investigated)

      Links to additional info:
      Data Import Stabilization plan - Vladimir Shalaev - FOLIO Wiki

      Questions

      TestRail: Results

        Attachments

          Issue Links

            Activity

              People

                abreaux Ann-Marie Breaux
                Taisiya Trunova Taisiya Trunova (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  TestRail: Runs

                    TestRail: Cases