93 lines
3.3 KiB
Markdown
93 lines
3.3 KiB
Markdown
---
|
|
title: "March, 2024"
|
|
date: 2024-03-01T09:55:00+03:00
|
|
author: "Alan Orth"
|
|
categories: ["Notes"]
|
|
---
|
|
|
|
## 2024-03-01
|
|
|
|
- Last week Bizu reported an issue with the "browse by issue date" drop down
|
|
- I verified it, and suspect it could be due to missing issue dates...
|
|
- It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
|
|
|
|
<!--more-->
|
|
|
|
- I spent some time trying to reproduce the bug affecting `onebox` fields that are configured to use external vocabularies and are not repeatable
|
|
- I filed an issue: https://github.com/DSpace/dspace-angular/issues/2846
|
|
|
|
## 2024-03-03
|
|
|
|
- I did some cleanups on abstracts, licenses, and dates from CrossRef
|
|
- I also did some minor cleanups to affiliations because I saw some incorrect and duplicate ones in our list
|
|
|
|
## 2024-03-05
|
|
|
|
- I tried a new technique to get some affiliations from Crossref using OpenRefine
|
|
- First I split them and clustered, resolving a few hundred clusters out of 1500 (!)
|
|
- Then I used a custom text facet with a few dozen CGIAR and other large affiliations to reduce the work
|
|
- Then I joined them with our affiliations, paying no attention to duplicates
|
|
- Then I deduped them using the Jython technique I learned in 2023-02
|
|
|
|
## 2024-03-06
|
|
|
|
- Peter sent me some more corrections for the authors that I had sent him in 2023-12
|
|
|
|
## 2024-03-08
|
|
|
|
- IFPRI sent me their 2023 records from CONTENTdm so I started working on those
|
|
- I found a way to match their ORCID identifiers in our list using Jython in OpenRefine:
|
|
|
|
```python
|
|
import re
|
|
|
|
with open(r"/tmp/cg-creator-identifier.txt",'r') as f :
|
|
orcid_ids = [orcid_id.strip() for orcid_id in f]
|
|
|
|
matched = False
|
|
for orcid_id in orcid_ids:
|
|
if re.search(r'.+: {}'.format(value), orcid_id):
|
|
matched = True
|
|
break
|
|
|
|
if matched:
|
|
return orcid_id
|
|
else:
|
|
return value
|
|
```
|
|
|
|
|
|
- I realized that [UNICEF was renamed to its current name in 1953](https://www.unicef.org/about-unicef/frequently-asked-questions#3) so I replaced all other variations in our vocabularies and metadata:
|
|
|
|
```sql
|
|
UPDATE metadatavalue SET text_value='United Nations Children''s Fund' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_value IN ('United Nations International Children''s Emergency Fund', 'United Nations International Children''s Emergency Fund', 'UNICEF');
|
|
```
|
|
|
|
- Note the use of two single quotes to escape the one in the name
|
|
|
|
## 2024-03-11
|
|
|
|
- Experimenting with moving some of my Python scripts to the DSpace 7 REST API
|
|
- I need a way to get UUIDs for Handles...
|
|
- Seems that I can use a Discovery query like: https://dspace7test.ilri.org/server/api/discover/search/objects?dsoType=item&query=handle:10568/130864
|
|
- Then just take the first result...?
|
|
- I spent some time working on the script get abstracts from CGSpace, and found a bug in my logic
|
|
- I also noticed that one item had two abstracts, but the first one was blank!
|
|
- Looking deeper, I found 113 blank metadata values so I deleted those:
|
|
|
|
```sql
|
|
BEGIN;
|
|
DELETE FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_value='';
|
|
COMMIT;
|
|
```
|
|
|
|
- I also found a few dozen items with "N/A" for their citation, so I deleted those too:
|
|
|
|
```sql
|
|
BEGIN;
|
|
DELETE FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_value='N/A' AND metadata_field_id=146;
|
|
COMMIT;
|
|
```
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|