Status: Closed (View Workflow)
Affects Version/s: None
Fix Version/s: None
When harvesting a collection of 400k records , the harvest completes after only several thousands record have been harvested. Investigation of the mod-oai-pmh.instances table shows that not all the records are streamed from the inventory.
Steps to Reproduce:
Start initial harvest
All records are harvested
The harvest finishes after only a portion of the records is harvested. The resumptionToken in the last response is <resumptionToken cursor="cursorvalue"></resumptionToken>. Next record Id and request Ids are missing.
So far I wasn't able to recreate it our oaipmh testing environment using the same data set but the issue manifest itself on multiple production sites. Here is additional information from the harvesting:
Number of entries loaded into mod-oai-pmh.instances varies (it's been as low as 12K but I've gotten 300K too), and I see the same pattern when watching the system in live time, namely:
- Entries start appearing quickly in mod-oai-pmh.instances
- As soon as they stop appearing, no more will
- I'm able to harvest as many entries that appear in the table
- Empty resumption token is returned
We cannot recreate this issue in our test environments.