--- title: "February, 2024" date: 2024-02-05T11:10:00+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2024-02-05 - Delete duplicate metadata as described in my DSpace issue from last year: https://github.com/DSpace/DSpace/issues/8253 - Lower case all the AGROVOC subjects on CGSpace ```sql dspace=# BEGIN; BEGIN dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]'; UPDATE 180 dspace=*# COMMIT; COMMIT ``` ## 2024-02-06 - Discuss IWMI using the CGSpace REST API for their new website - Export the IWMI community to extract their ORCID identifiers: ```console $ dspace metadata-export -i 10568/16814 -f /tmp/iwmi.csv $ csvcut -c 'cg.creator.identifier,cg.creator.identifier[en_US]' ~/Downloads/2024-02-06-iwmi.csv \ | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' \ | sort -u \ | tee /tmp/iwmi-orcids.txt \ | wc -l 353 $ ./ilri/resolve_orcids.py -i /tmp/iwmi-orcids.txt -o /tmp/iwmi-orcids-names.csv -d ``` - I noticed some similar looking names in our list so I clustered them in OpenRefine and manually checked a dozen or so to update our list ## 2024-02-07 - Maria asked me about the "missing" item from last week again - I can see it when I used the Admin search, but not in her workflow - It was submitted by TIP so I checked that user's workspace and found it there - After depositing, it went into the workflow so Maria should be able to see it now ## 2024-02-09 - Minor edits to CGSpace submission form - Upload 55 ISNAR book chapters to CGSpace from Peter ## 2024-02-19 - Looking into the collection mapping issue on CGSpace - It seems to be by design in DSpace 7: https://github.com/DSpace/dspace-angular/issues/1203 - This is a massive setback for us... ## 2024-02-20 - Minor work on OpenRXV to fix a bug in the ng-select drop downs - Minor work on the DSpace 7 nginx configuration to allow requesting robots.txt and sitemaps without hitting rate limits ## 2024-02-21 - Minor updates on OpenRXV, including one bug fix for missing mapped collections - Salem had to re-work the harvester for DSpace 7 since the mapped collections and parent collection list are separate! ## 2024-02-22 - Discuss tagging of datasets and re-work the submission form to encourage use of DOI field for any item that has a DOI, and the normal URL field if not - The "cg.identifier.dataurl" field will be used for "related" datasets - I still have to check and move some metadata for existing datasets ## 2024-02-23 - This morning Tomcat died due to an OOM kill from the kernel: ```console kernel: Out of memory: Killed process 698 (java) total-vm:14151300kB, anon-rss:9665812kB, file-rss:320kB, shmem-rss:0kB, UID:997 pgtables:20436kB oom_score_adj:0 ``` - I don't see any abnormal pattern in my Grafana graphs, for JVM or system load... very weird - I updated the submission form on CGSpace to include the new changes to URLs for datasets - I also updated about 80 datasets to move the URLs to the correct field ## 2024-02-25 - This morning Tomcat died while I was doing a CSV export, with an OOM kill from the kernel: ```console kernel: Out of memory: Killed process 720768 (java) total-vm:14079976kB, anon-rss:9301684kB, file-rss:152kB, shmem-rss:0kB, UID:997 pgtables:19488kB oom_score_adj:0 ``` - I don't know why this is happening so often recently... ## 2024-02-27 - IFPRI sent me a list of authors to add to our list for now, until we can find a better way of doing it - I extracted the existing authors from our controlled vocabulary and combined them with IFPRI's: ```console $ xmllint --xpath '//node/isComposedBy/node()' dspace/config/controlled-vocabularies/dc-contributor-author.xml \ | grep -oE 'label=".*"' \ | sed -e 's/label="//' -e 's/"$//' > /tmp/authors $ cat /tmp/authors /tmp/ifpri-authors | sort -u > /tmp/new-authors ``` ## 2024-02-28 - I figured out a way to add a new Angular component to handle all our relation fields ## 2024-02-29 - Clean up a bunch of metadata on CGSpace