- Spend some time looking at duplicate DOIs again...
## 2024-04-13
- Spend some time looking at duplicate DOIs again...
## 2024-04-14
- Spend some time looking at duplicate DOIs again...
## 2024-04-15
- Spend some time looking at duplicate DOIs again...
- Delete ~260 duplicate metadata values using the elaborate SQL and sort method I documented here: https://github.com/DSpace/DSpace/issues/8253#issuecomment-1331756418
- Tony noticed that the DSpace 7 REST API is very slow with the embeds so I profiled a bit:
```
$ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&embed=thumbnail,bundles/bitstreams&sort=dcterms.issued,desc'
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 47.515 total
- Assist Deborah with an advanced query on CGSpace for biodiversity and health:
```
dcterms.issued:[2010 TO 2024] AND dcterms.type:"Journal Article" AND (dc.title:"biodiversity" OR dcterms.subject:"biodiversity" OR dc.title:"health" OR dcterms.subject:"health")
```
- Remove CIMMYT URLs and citations from 277 journal articles on CGSpace since it is a bit tacky
- I used this Jython expression in OpenRefine with [Crossref's content negotiation](https://citation.crosscite.org/docs.html) to get citations for all DOIs:
```python
import urllib2
doi = cells['cg.identifier.doi[en_US]'].value
url = "https://api.crossref.org/works/" + doi + "/transform/text/x-bibliography"
- Write a SQL query to build the IFPRI CONTENTdm redirects to Handles:
```sql
SELECT m.text_value, h.handle FROM metadatavalue m JOIN handle h on m.dspace_object_id = h.resource_id WHERE m.metadata_field_id=28 AND m.text_value LIKE 'Original URL%' AND h.resource_type_id=2;
```
- Similarly, I need a SQL query to get the redirects for duplicate Handles, querying for `dcterms.replaces`:
```sql
SELECT m.text_value AS handle_from, h.handle AS handle_to FROM metadatavalue m JOIN handle h on m.dspace_object_id = h.resource_id WHERE m.metadata_field_id=181 AND h.resource_type_id=2;
```
- Then I can work that list into an nginx map with redirect, for example: