Uploaded image for project: 'mod-source-record-storage'
  1. mod-source-record-storage
  2. MODSOURCE-182

Use SQL JOIN in /records query method

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Template:
    • Story Points:
      8
    • Development Team:
      Folijet

      Description

      I'm able to export 8.7 million instances in 17 minutes, but, it takes 3.5 hours to do the corresponding SRS records. The following is why:

      2020-09-02 15:54:39.479 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "records_lb"."id", "records_lb"."snapshot_id", "records_lb"."matched_id", "records_lb"."generation", "records_lb"."record_type", "records_lb"."instance_id", "records_lb"."state", "records_lb"."leader_record_status", "records_lb"."order", "records_lb"."suppress_discovery", "records_lb"."created_by_user_id", "records_lb"."created_date", "records_lb"."updated_by_user_id", "records_lb"."updated_date" from "records_lb" limit $1
      2020-09-02 15:54:39.479 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '5000'
      2020-09-02 15:54:39.491 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select count(*) from "records_lb"
      2020-09-02 15:54:45.150 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "raw_records_lb"."id", "raw_records_lb"."content" from "raw_records_lb" where "raw_records_lb"."id" = cast($1 as uuid)
      2020-09-02 15:54:45.150 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '0096d136-d6f5-5410-ad50-58fc6d2ded73'
      2020-09-02 15:54:45.153 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "id", "content" from "marc_records_lb" where "id" = cast($1 as uuid)
      2020-09-02 15:54:45.153 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '0096d136-d6f5-5410-ad50-58fc6d2ded73'
      2020-09-02 15:54:45.158 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "error_records_lb"."id", "error_records_lb"."content", "error_records_lb"."description" from "error_records_lb" where "error_records_lb"."id" = cast($1 as uuid)
      2020-09-02 15:54:45.158 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '0096d136-d6f5-5410-ad50-58fc6d2ded73'
      2020-09-02 15:54:45.158 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "raw_records_lb"."id", "raw_records_lb"."content" from "raw_records_lb" where "raw_records_lb"."id" = cast($1 as uuid)
      2020-09-02 15:54:45.158 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '7afa10f6-7766-5192-bf20-37f784909b33'
      2020-09-02 15:54:45.160 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "id", "content" from "marc_records_lb" where "id" = cast($1 as uuid)
      2020-09-02 15:54:45.160 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '7afa10f6-7766-5192-bf20-37f784909b33'
      2020-09-02 15:54:45.161 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "error_records_lb"."id", "error_records_lb"."content", "error_records_lb"."description" from "error_records_lb" where "error_records_lb"."id" = cast($1 as uuid)
      2020-09-02 15:54:45.161 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '7afa10f6-7766-5192-bf20-37f784909b33'
      2020-09-02 15:54:45.161 CDT [608878] diku_mod_source_record_storage@folio LOG:  execute <unnamed>: select "raw_records_lb"."id", "raw_records_lb"."content" from "raw_records_lb" where "raw_records_lb"."id" = cast($1 as uuid)
      2020-09-02 15:54:45.161 CDT [608878] diku_mod_source_record_storage@folio DETAIL:  parameters: $1 = '7c09c30e-648e-5136-aad0-4c73bcb7155d'
      

      It's doing N+1 queries looking up the raw records, error records, and marc records, rather than doing a JOIN on the tables. Can you switch to doing a JOIN on the tables? I think you will find that there is a dramatic increase in performance.

        TestRail: Results

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                wwelling William Welling
                Reporter:
                jemiller Jon Miller
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:

                    TestRail: Runs

                      TestRail: Cases