mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
2.4 KiB
2.4 KiB
title, date, author, categories
title | date | author | categories | |
---|---|---|---|---|
March, 2024 | 2024-03-01T09:55:00+03:00 | Alan Orth |
|
2024-03-01
- Last week Bizu reported an issue with the "browse by issue date" drop down
- I verified it, and suspect it could be due to missing issue dates...
- It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
- I spent some time trying to reproduce the bug affecting
onebox
fields that are configured to use external vocabularies and are not repeatable- I filed an issue: https://github.com/DSpace/dspace-angular/issues/2846
2024-03-03
- I did some cleanups on abstracts, licenses, and dates from CrossRef
- I also did some minor cleanups to affiliations because I saw some incorrect and duplicate ones in our list
2024-03-05
- I tried a new technique to get some affiliations from Crossref using OpenRefine
- First I split them and clustered, resolving a few hundred clusters out of 1500 (!)
- Then I used a custom text facet with a few dozen CGIAR and other large affiliations to reduce the work
- Then I joined them with our affiliations, paying no attention to duplicates
- Then I deduped them using the Jython technique I learned in 2023-02
2024-03-06
- Peter sent me some more corrections for the authors that I had sent him in 2023-12
2024-03-08
- IFPRI sent me their 2023 records from CONTENTdm so I started working on those
- I found a way to match their ORCID identifiers in our list using Jython in OpenRefine:
import re
with open(r"/tmp/cg-creator-identifier.txt",'r') as f :
orcid_ids = [orcid_id.strip() for orcid_id in f]
matched = False
for orcid_id in orcid_ids:
if re.search(r'.+: {}'.format(value), orcid_id):
matched = True
break
if matched:
return orcid_id
else:
return value
- I realized that UNICEF was renamed to its current name in 1953 so I replaced all other variations in our vocabularies and metadata:
UPDATE metadatavalue SET text_value='United Nations Children''s Fund' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_value IN ('United Nations International Children''s Emergency Fund', 'United Nations International Children''s Emergency Fund', 'UNICEF');
- Note the use of two single quotes to escape the one in the name