Details
-
Bug
-
Status: Closed (View Workflow)
-
P3
-
Resolution: Done
-
None
-
ACQ Sprint 116
-
2
-
Thunderjet
Description
Overview:
uuid_generate_v4() creates a different UUID in each node of a replicated PostgreSQL environment.
This has been reported for the similar function gen_random_uuid() with Pgpool-II replication in #sys-ops on Slack. From the Pgpool-II manual:
https://www.pgpool.net/docs/latest/en/html/restrictions.html
There is no guarantee that any data provided using a context-dependent mechanism (e.g. random number, transaction ID, OID, SERIAL, sequence), will be replicated correctly on multiple backends.
RMB documentation mentions it in the upgrading guide for RMB 25 and in connection with "generateId": https://github.com/folio-org/raml-module-builder/#the-post-tenant-api
Tasks:
- Remove uuid_generate_v4() and "CREATE EXTENSION uuid-ossp" from
- budget_encumbrances_rollover.sql and
- pending_payment_cross_module.ftl
Note:
For predefined records use a hard-coded UUID.
For calculated records re-use the UUID of the primary source record.
Or use the md5sum trick: If a string (for example a JSONB::text) is unique calculate md5sum(s)::uuid to get a unique UUID.
Analysis
It is not simple to replace uuid_generate_v4() in these scripts. They create new records based on existing ones, and they need new ids which have to be different from existing ones and unique.
- Using a hard-coded id is not an option because many records are created in the same statement and ids needs to be different.
- Reusing existing ids is not an option when new records are added to a table based on other records in the same table, otherwise we would have the same uuid for different records in the same table.
- Using md5() only works if we find text that is garanteed to be unique. What if 2 rows in a new table are identical ?
I looked at using row_number() to generate new ids, but this can be non-deterministic, so that is not an option in the context of pgpool.
Creating new records only from Java would resolve the issue, but this could cause major performance issues in some cases.
For something like
INSERT INTO ... SELECT uuid_generate_v4(), ... FROM transaction tr
we can use
md5('A1'||tr.id)::uuid
where the string 'A' is a unique identifier for the script and '1' is a number for the statement in the script. tr.id should be unique for this statement, so the generated uuid should be unique (although it is not as good as a random number generated with uuid_generate_v4())
It gets more complicated when the generated records are created from joins. For instance:
INSERT INTO ... SELECT uuid_generate_v4(), ... FROM transaction tr LEFT JOIN fund fund ON fund.id = tr.fromFundId
In general we can't simply use tr.id because there might be several rows returned with the same tr.id. tr.id||fund.id can't be used either because fund.id could be null. A general solution for this case is:
md5('A2'||tr.id||coalesce(fund.id,''))::uuid
This can be simplified to
md5(concat('A2', tr.id, fund.id))::uuid
because concat ignores null values.
There is a problem with md5()::uuid: the generated uuids do not match the regular expression used to verify them everywhere:
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$
A solution is to use uuid_generate_v5 from uuid-ossp instead. As a bonus, it uses sha1 instead of md5. It does require a namespace, however. A nil uuid works for that. The previous expression becomes
public.uuid_generate_v5(public.uuid_nil(), concat('A2', tr.id, fund.id))
This means we have to keep the uuid-ossp extension.
TestRail: Results
Attachments
Issue Links
- blocks
-
MODFISTO-226 MODFISTO (mod-finance-storage) release
-
- Closed
-
- defines
-
UXPROD-2630 Thunderjet - R2 Enhancements/Bugfixes
-
- Closed
-
- relates to
-
FOLIO-2978 Remove uuid_generate_v4(), it fails in pgpool native replication
-
- Open
-