mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2024-03-08
This commit is contained in:
@ -21,4 +21,48 @@ categories: ["Notes"]
|
||||
- I did some cleanups on abstracts, licenses, and dates from CrossRef
|
||||
- I also did some minor cleanups to affiliations because I saw some incorrect and duplicate ones in our list
|
||||
|
||||
## 2024-03-05
|
||||
|
||||
- I tried a new technique to get some affiliations from Crossref using OpenRefine
|
||||
- First I split them and clustered, resolving a few hundred clusters out of 1500 (!)
|
||||
- Then I used a custom text facet with a few dozen CGIAR and other large affiliations to reduce the work
|
||||
- Then I joined them with our affiliations, paying no attention to duplicates
|
||||
- Then I deduped them using the Jython technique I learned in 2023-02
|
||||
|
||||
## 2024-03-06
|
||||
|
||||
- Peter sent me some more corrections for the authors that I had sent him in 2023-12
|
||||
|
||||
## 2024-03-08
|
||||
|
||||
- IFPRI sent me their 2023 records from CONTENTdm so I started working on those
|
||||
- I found a way to match their ORCID identifiers in our list using Jython in OpenRefine:
|
||||
|
||||
```python
|
||||
import re
|
||||
|
||||
with open(r"/tmp/cg-creator-identifier.txt",'r') as f :
|
||||
orcid_ids = [orcid_id.strip() for orcid_id in f]
|
||||
|
||||
matched = False
|
||||
for orcid_id in orcid_ids:
|
||||
if re.search(r'.+: {}'.format(value), orcid_id):
|
||||
matched = True
|
||||
break
|
||||
|
||||
if matched:
|
||||
return orcid_id
|
||||
else:
|
||||
return value
|
||||
```
|
||||
|
||||
|
||||
- I realized that [UNICEF was renamed to its current name in 1953](https://www.unicef.org/about-unicef/frequently-asked-questions#3) so I replaced all other variations in our vocabularies and metadata:
|
||||
|
||||
```sql
|
||||
UPDATE metadatavalue SET text_value='United Nations Children''s Fund' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_value IN ('United Nations International Children''s Emergency Fund', 'United Nations International Children''s Emergency Fund', 'UNICEF');
|
||||
```
|
||||
|
||||
- Note the use of two single quotes to escape the one in the name
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user