I did a load of our 116,521 users using the mod-users CRUD web API. It took an hour and 33 minutes. The same load using SQL took a minute and 17 seconds.
I enabled PostgreSQL SQL logging to see what it was doing and found that it is doing the following for each INSERT of a user.
To improve performance I would recommend making the following changes.
RMB-718: For web API query methods, don't include a count by default. Add a parameter that optionally includes it. It may make sense to include it for a web app grid control, but, there are plenty of other times where you don't want it and it is just unnecessarily slowing things down. That would eliminate the calls to count_estimate() above.
MODUSERS-3: Add a web API that accepts an array of objects like the ones that already exist for instances, holdings, and items. As you can see above an SQL COMMIT is happening for each user inserted. For decent performance, you need to have the ability to do a commit with a larger batch size.
3. Eliminate the custom_fields query. An idea is to store the custom field info as part of the object's JSON schema. You could do something like read the JSON schema file when the module is loaded, and dynamically add the custom field info to it. Then, only when the custom fields are modified reload the schema.
4. Eliminate the groups query. The patronGroup JSON property gets copied to the patrongroup column which has a foreign key constraint on it. Therefore, if the groups query is happening to validate the group id, it isn't needed as the foreign key constraint will take care of that automatically.
5. MODUSERS-204: Eliminate the INSERT into permissions_users. The only time a user should have to have a permissions_user row is if the user actually has permissions. At our site, there will likely be less than 800 users that have permissions, out of the close to 120,000 total users. It's a waste of time inserting permissions_users for users that have no permissions. It is probably better that users and permissions modules not have that tight of coupling anyway. This woudl also eliminate the query to see whether the permissions_user exists.
6. Remove the addresses array from the users table JSON schema and store it in a separate user_addresses table. If you do this, then, you can add a foreign key constraint to the addressTypeId field and can eliminate the query to the addresstypes table.
In summary, to improve bulk insert performance of users, you need to remove the N+1 queries that are happening on each INSERT of a user. You also need to be able to control the batch size and have the ability to do something like only do a SQL COMMIT every 1,000 rows. If you do all of the above, I think you will see a dramatic improvement in performance.
I haven't looked at UPDATE. I'm assuming it probably works similarly.