mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-18 11:12:18 +01:00
4.0 KiB
4.0 KiB
title | date | author | categories | |
---|---|---|---|---|
February, 2024 | 2024-02-05T11:10:00+03:00 | Alan Orth |
|
2024-02-05
- Delete duplicate metadata as described in my DSpace issue from last year: https://github.com/DSpace/DSpace/issues/8253
- Lower case all the AGROVOC subjects on CGSpace
dspace=# BEGIN;
BEGIN
dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
UPDATE 180
dspace=*# COMMIT;
COMMIT
2024-02-06
- Discuss IWMI using the CGSpace REST API for their new website
- Export the IWMI community to extract their ORCID identifiers:
$ dspace metadata-export -i 10568/16814 -f /tmp/iwmi.csv
$ csvcut -c 'cg.creator.identifier,cg.creator.identifier[en_US]' ~/Downloads/2024-02-06-iwmi.csv \
| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' \
| sort -u \
| tee /tmp/iwmi-orcids.txt \
| wc -l
353
$ ./ilri/resolve_orcids.py -i /tmp/iwmi-orcids.txt -o /tmp/iwmi-orcids-names.csv -d
- I noticed some similar looking names in our list so I clustered them in OpenRefine and manually checked a dozen or so to update our list
2024-02-07
- Maria asked me about the "missing" item from last week again
- I can see it when I used the Admin search, but not in her workflow
- It was submitted by TIP so I checked that user's workspace and found it there
- After depositing, it went into the workflow so Maria should be able to see it now
2024-02-09
- Minor edits to CGSpace submission form
- Upload 55 ISNAR book chapters to CGSpace from Peter
2024-02-19
- Looking into the collection mapping issue on CGSpace
- It seems to be by design in DSpace 7: https://github.com/DSpace/dspace-angular/issues/1203
- This is a massive setback for us...
2024-02-20
- Minor work on OpenRXV to fix a bug in the ng-select drop downs
- Minor work on the DSpace 7 nginx configuration to allow requesting robots.txt and sitemaps without hitting rate limits
2024-02-21
- Minor updates on OpenRXV, including one bug fix for missing mapped collections
- Salem had to re-work the harvester for DSpace 7 since the mapped collections and parent collection list are separate!
2024-02-22
- Discuss tagging of datasets and re-work the submission form to encourage use of DOI field for any item that has a DOI, and the normal URL field if not
- The "cg.identifier.dataurl" field will be used for "related" datasets
- I still have to check and move some metadata for existing datasets
2024-02-23
- This morning Tomcat died due to an OOM kill from the kernel:
kernel: Out of memory: Killed process 698 (java) total-vm:14151300kB, anon-rss:9665812kB, file-rss:320kB, shmem-rss:0kB, UID:997 pgtables:20436kB oom_score_adj:0
- I don't see any abnormal pattern in my Grafana graphs, for JVM or system load... very weird
- I updated the submission form on CGSpace to include the new changes to URLs for datasets
- I also updated about 80 datasets to move the URLs to the correct field
2024-02-25
- This morning Tomcat died while I was doing a CSV export, with an OOM kill from the kernel:
kernel: Out of memory: Killed process 720768 (java) total-vm:14079976kB, anon-rss:9301684kB, file-rss:152kB, shmem-rss:0kB, UID:997 pgtables:19488kB oom_score_adj:0
- I don't know why this is happening so often recently...
2024-02-27
- IFPRI sent me a list of authors to add to our list for now, until we can find a better way of doing it
- I extracted the existing authors from our controlled vocabulary and combined them with IFPRI's:
$ xmllint --xpath '//node/isComposedBy/node()' dspace/config/controlled-vocabularies/dc-contributor-author.xml \
| grep -oE 'label=".*"' \
| sed -e 's/label="//' -e 's/"$//' > /tmp/authors
$ cat /tmp/authors /tmp/ifpri-authors | sort -u > /tmp/new-authors
2024-02-28
- I figured out a way to add a new Angular component to handle all our relation fields
2024-02-29
- Clean up a bunch of metadata on CGSpace