--- title: "April, 2024" date: 2024-04-04T10:23:00+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2024-04-04 - Work on CGSpace duplicate DOIs more ## 2024-04-08 - Start working on IFPRI's 2022 batch import - I ran the duplicate checker against CGSpace and started downloading all linked PDFs ## 2024-04-09 - Continue working on IFPRI's 2022 batch import - I started validating the potential duplicates in OpenRefine ## 2024-04-12 - Finish working on the 650 IFPRI 2022 records that were not already on CGSpace, then uploaded them - I need to merge the metadata for the remaining 212 that are already on CGSpace - Spend some time looking at duplicate DOIs again... ## 2024-04-13 - Spend some time looking at duplicate DOIs again... ## 2024-04-14 - Spend some time looking at duplicate DOIs again... ## 2024-04-15 - Spend some time looking at duplicate DOIs again... - Delete ~260 duplicate metadata values using the elaborate SQL and sort method I documented here: https://github.com/DSpace/DSpace/issues/8253#issuecomment-1331756418 - Tony noticed that the DSpace 7 REST API is very slow with the embeds so I profiled a bit: ``` $ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&embed=thumbnail,bundles/bitstreams&sort=dcterms.issued,desc' curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 47.515 total $ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&sort=dcterms.issued,desc' curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total ``` - Finalize processing the remaining 206 items from the IFPRI 2022 batch set that already existed on CGSpace - I merged metadata with the existing items - There are still six remaining items that I identified as being duplicates (3x2) in the IFPRI set itself ## 2024-04-16 - Spend some time looking at duplicate DOIs again...