diff --git a/content/posts/2021-02.md b/content/posts/2021-02.md index ad3640b69..5d01c7cf9 100644 --- a/content/posts/2021-02.md +++ b/content/posts/2021-02.md @@ -444,7 +444,7 @@ Purging 8085 hits from 45.146.165.104 in statistics Total number of bot hits purged: 70731 ``` -- My god, and 64.39.99.15 is from Qualys, the domain scanning security people, who are making queries trying to see if we are vulnerable or something (?) +- My god, and 64.39.99.15 is from Qualys, the domain scanning security people, who are making queries trying to see if we are vulnerable or something (wtf?) - Looking in Solr I see a few different IPs with DNS like `sn003.s02.iad01.qualys.com.` so I will purge their requests too: ```console @@ -459,4 +459,56 @@ Purging 12 hits from 64.39.99.94 in statistics Total number of bot hits purged: 23789 ``` +## 2021-02-17 + +- I tested Node.js 10 vs 12 on CGSpace (linode18) and DSpace Test (linode26) and the build times were surprising + - Node.js 10 + - linode26: [INFO] Total time: 17:07 min + - linode18: [INFO] Total time: 19:26 min + - Node.js 12 + - linode26: [INFO] Total time: 17:14 min + - linode18: [INFO] Total time: 19:43 min +- So I guess there is no need to use Node.js 12 any time soon, unless 10 becomes end of life +- Abenet asked me to add Tom Randolph's ORCID identifier to CGSpace +- I also tagged all his 247 existing items on CGSpace: + +```console +$ cat 2021-02-17-add-tom-orcid.csv +dc.contributor.author,cg.creator.id +"Randolph, Thomas F.","Thomas Fitz Randolph: 0000-0003-1849-9877" +$ ./ilri/add-orcid-identifiers-csv.py -i 2021-02-17-add-tom-orcid.csv -db dspace -u dspace -p 'fuuu' +``` + +## 2021-02-20 + +- Test the CG Core v2 migration on DSpace Test (linode26) one last time + +## 2021-02-21 + +- Start the CG Core v2 migration on CGSpace (linode18) +- After deploying the latest `6_x-prod` branch and running `migrate-fields.sh` I started a full Discovery reindex: + +```console +$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b +``` + +- Ben Hack was asking if there is a REST API query that will give him all ILRI outputs for their new Sharepoint intranet + - I told him he can try to use something like this if it's just something like the ILRI articles in journals collection: + +https://cgspace.cgiar.org/rest/collections/8ea4b611-1f59-4d4e-b78d-a9921a72cfe7/items?limit=100&offset=0 + +- But I don't know if he wants the entire ILRI community, in which case he needs to get the collections recursively and iterate over them, or if his software can manage the iteration over the pages of item results using limit and offset +- Help proof and upload 1095 CIFOR items to DSpace Test for Abenet + - There were a few dozen issues with author affiliations, but the metadata was otherwise very good quality + - I ran the data through the csv-metadata-quality tool nevertheless to fix some minor formatting issues + - I uploaded it to DSpace Test to check for duplicates + +```console +$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m' +$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/cifor.csv +``` + +- The process took an hour or so! +- I added colorized output to the csv-metadata-quality tool and tagged [version 0.4.4 on GitHub](https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.4) + diff --git a/docs/2021-02/index.html b/docs/2021-02/index.html index ebfd3018c..d10f01b6a 100644 --- a/docs/2021-02/index.html +++ b/docs/2021-02/index.html @@ -32,7 +32,7 @@ $ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty - + @@ -70,9 +70,9 @@ $ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty "@type": "BlogPosting", "headline": "February, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-02/", - "wordCount": "2725", + "wordCount": "3061", "datePublished": "2021-02-01T10:13:54+02:00", - "dateModified": "2021-02-16T12:56:10+02:00", + "dateModified": "2021-02-16T20:19:14+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -591,7 +591,7 @@ Purging 8085 hits from 45.146.165.104 in statistics Total number of bot hits purged: 70731