Add notes for 2023-08-22

This commit is contained in:
2023-08-22 17:28:49 +03:00
parent f38ecfb75e
commit d2ad21bde1
32 changed files with 88 additions and 37 deletions

View File

@ -152,4 +152,25 @@ $ ./run.sh -s http://localhost:8081/solr/statistics -a import -o /tmp/statistics
- Export CGSpace to check for missing Initiative collection mappings
## 2023-08-19
- Start a harvest on AReS
## 2023-08-21
- Experiment with the DSpace 7 REST API
- I wrote a Python script to benchmark harvesting all 100,000+ items using the `/api/discover/search/objects` endpoint 100 items at a time
- I was able to harvest the entire 106,000 items in fifty-two minutes, which seems slow, but that's about ten times faster than with the legacy REST API...
- Still, I need to benchmark a bit more, as the item response doesn't include collection mappings or thumbnails
- Reading the [API docs](https://github.com/DSpace/RestContract/blob/main/README.md#etags--conditional-headers) it seems that we should be able to use the standard `If-Modified-Since` header for some endpoints
- I tried it on the `/api/discover/search/objects` and `/api/core/items` endpoints, but apparently those don't support this header because I don't see a `Last-Modified` header in the response
- According to the docs, it means that these endpoints indeed don't support it...
## 2023-08-22
- I was experimenting with the DSpace 7 REST API again
- This time looking at the thumbnail responses in item endpoints
- According to [the documentation](https://github.com/DSpace/RestContract/blob/main/items.md#main-thumbnail) the API will respond with HTTP 200 if there is a thumbnail, and HTTP 204 if there is no content
- That means we need to make the request before we can even find out!
<!-- vim: set sw=2 ts=2: -->