There have been frequent requests for batch APIs for various types of records in FOLIO
Some modules have started to implement batching, either in response to this, or performance concerns.
These implementations all differ, I think it could be valuable to try to decide on a pattern for these.
Jon Miller has suggested that we may want to use this to load typical batch sizes of 100 .. 2000 records.
Should the server not respond until the batch processing has finished, or should it respond promptly (maybe after some validation) with the ability to monitor the status of the operation?
How does this affect the client?
Is this decision affected by the size of batch we allow, as that is likely a primary component of latency?
Should a response include complete representation of all of the records created, or references or even no information at all (except failures, depending upon the question below)?
Should a batch only succeed if all records and valid, or should it be acceptable for some records to be invalid?
What should happen if all of the records are valid however persistence of some of them fails (this is likely related to the transactions topic below)?
Should the records that are created be done so in a single transaction?
How could this decision affect the handling partial success or failure, if we decide we also want this?
How does this affect resource usage, e.g. a connection has to used exclusively for each batch operation, which could lead to connection contention within the module?
To constrain memory usage during batch operations, should the set of records be processed as a stream of single (or multiple records)?
How does this affect validation, any restrictions on batch size or database transaction semantics?
For example, if we wanted to validate all records prior to any persistence, we might need to be able to process the stream more than once.
- Optional ID
- JSON schema validation