Automating Data Export

This guide is for Goldenrod. The endpoints will change in Honeysuckle and the status messages in Step 5 may change as well.

I start with a couple of assumptions. First, I assume that you already have a csv file of UUIDs. 
I’m not going into detail into how to automate that, but in my case, since I’m using this to populate EDS, 
I will script a query to find records that have been updated since a particular date and then step through the results, 
pull out the UUID, and  save the results in a file.

My second assumption is that you are using Postman. I always start with Postman to get my endpoints/queries lined up and 
then move the results to a script. If you are using something other than Postman you may need to adjust my instructions
 to fit your preferred app or method. 
 
 https://folio-testing-okapi.dev.folio.org/instance-bulk/ids?limit=2147483647&query=%28keyword%20all%20%22nod%22%29%20sortby%20title 

Step 1:
Begin by getting the id of the job profile you will be using by doing a GET to this endpoint:
/data-export/jobProfiles
You will get something like this:
{
    "jobProfiles": [
        {
            "id": "6f7f3cd7-9f24-42eb-ae91-91af1cd54d0a",
            "name": "Default job profile",
            "destination": "fileSystem",
            "description": "Default job profile",
            "userInfo": {
                "firstName": "System",
                "lastName": "Process",
                "userName": "system_process"
            },
            "mappingProfileId": "25d81cbe-9686-11ea-bb37-0242ac130002",
            "metadata": {
                "createdDate": "2020-07-28T00:00:00.000+0000",
                "createdByUserId": "00000000-0000-0000-0000-000000000000",
                "createdByUsername": "system_process",
                "updatedDate": "2020-07-28T00:00:00.000+0000",
                "updatedByUserId": "00000000-0000-0000-0000-000000000000",
                "updatedByUsername": "system_process"
            }
        }
    ],
    "totalRecords": 1
}
You will need the profile id (id above) from your preferred job in a later step.

Step 2:
We need to tell the server about the file of UUIDs we are going to upload. We do this by POSTing to the following endpoint:
/data-export/fileDefinitions

With a body of that looks like this:
{
    "fileName": "test.csv"
}
This will return a message that will look like this:
{
    "id": "d0d6b4dc-2207-42ec-ac6b-3c30300e4307",
    "fileName": "test.csv",
    "status": "NEW",
    "metadata": {
        "createdDate": "2020-10-07T13:17:02.263+0000",
        "createdByUserId": "873bbfa4-7f40-4bd0-aa27-027494a4b5e9",
        "updatedDate": "2020-10-07T13:17:02.263+0000",
        "updatedByUserId": "873bbfa4-7f40-4bd0-aa27-027494a4b5e9"
    }
}
You will need the file definition id for the next step.

Step 3:
Now you actually upload the file by POSTing  to this endpoint:
/data-export/fileDefinitions/<file definition id>/upload
You will need to change the header Content-Type to ‘application/octet-stream’
The body of the request will need to be changed to ‘binary’. In Postman this will ask you to find the file you want to upload. 
The key here is that you are sending the actual file to the server, not just its filename. 
The result will look something like this:
{
    "id": "d0d6b4dc-2207-42ec-ac6b-3c30300e4307",
    "fileName": "test.csv",
    "jobExecutionId": "22091d4a-0c70-4a9e-8843-5dcf96c7cdf5",
    "sourcePath": "./storage/files/d0d6b4dc-2207-42ec-ac6b-3c30300e4307/test.csv",
    "status": "COMPLETED",
    "metadata": {
        "createdDate": "2020-10-07T13:17:02.263+0000",
        "createdByUserId": "873bbfa4-7f40-4bd0-aa27-027494a4b5e9",
        "updatedDate": "2020-10-07T13:17:02.263+0000",
        "updatedByUserId": "873bbfa4-7f40-4bd0-aa27-027494a4b5e9"
    }
}
You will need the jobExecutionId for step 5.


Step 4:
Perform the export by POSTing to this endpoint: 
/data-export/export
Content-Type will be ‘application-json’ and the body of the request will look like this:
{
    "fileDefinitionId": "<file definition id>",
    "jobProfileId": "<job profile id>"
}
If everything goes well the server will return a 204 No Content message.

Step 5:
This step is asynchronous, so you need to create a polling loop by doing a 
GET to /data-export/jobExecutions?query=id=="<jobExecutionId>"
You will get something like this:
{
            "id": "22091d4a-0c70-4a9e-8843-5dcf96c7cdf5",
            "hrId": "112",
            "exportedFiles": [
                {
                    "fileId": "b975b199-fa3c-4cc8-8257-4e996e64addb",
                    "fileName": "test-112.mrc"
                }
            ],
            "jobProfileId": "6f7f3cd7-9f24-42eb-ae91-91af1cd54d0a",
            "jobProfileName": "Default job profile",
            "progress": {
                "exported": 25,
                "failed": 0,
                "total": "25"
            },
            "completedDate": "2020-10-07T13:18:20.364+0000",
            "startedDate": "2020-10-07T13:18:19.852+0000",
            "runBy": {
                "firstName": "admin",
                "lastName": "admin"
            },
            "status": "SUCCESS"
        }
Stop the polling when the status is either SUCCESS or FAIL. You might want to add a timing component that stops 
the polling after a specified period of time to prevent the script from entering an endless loop if the status
does not change.
Once the polling loop terminates, you will need the job id (id above) and the file id (fileId). 

Step 6:
Finally do a GET to this endpoint:
/data-export/jobExecutions/<id>/download/<fileId>
The server will return a json with two fields; fileId and link. 
From here your script would need to use the link to download the file and then do whatever work you need to do with the file. 
I haven’t worked through the details for this for our particular needs yet.