diff --git a/content/posts/2019-11.md b/content/posts/2019-11.md index 6640a9c8a..50ce9c842 100644 --- a/content/posts/2019-11.md +++ b/content/posts/2019-11.md @@ -303,4 +303,22 @@ $ curl -s 'http://localhost:8081/solr/statistics/select' -d 'q=userAgent:/Postge - I updated the `check-spider-hits.sh` script to use the POST syntax, and I'm evaluating the feasability of including the regex search patterns from the spider agent file, as I had been filtering them out due to differences in PCRE and Solr regex syntax and issues with shell handling +## 2019-11-14 + +- IWMI sent a few new ORCID identifiers for us to add to our controlled vocabulary +- I will merge them with our existing list and then resolve their names using my `resolve-orcids.py` script: + +``` +$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/iwmi-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2019-11-14-combined-orcids.txt +$ ./resolve-orcids.py -i /tmp/2019-11-14-combined-orcids.txt -o /tmp/2019-11-14-combined-names.txt -d +# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents) +$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml +``` + +- I created a [pull request](https://github.com/ilri/DSpace/pull/437) and merged them into the `5_x-prod` branch + - I will deploy them to CGSpace in the next few days +- Greatly improve my `check-spider-hits.sh` script to handle regular expressions in the spider agents patterns file + - This allows me to detect and purge many more hits from the Solr statistics core + - I've tested it quite a bit on DSpace Test, but I need to do a little more before I feel comfortable running the new code on CGSpace's Solr cores + diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html index ec6dfded6..66e5dad60 100644 --- a/docs/2019-11/index.html +++ b/docs/2019-11/index.html @@ -34,7 +34,7 @@ Let’s see how many of the REST API requests were for bitstreams (because t - + @@ -73,9 +73,9 @@ Let’s see how many of the REST API requests were for bitstreams (because t "@type": "BlogPosting", "headline": "November, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-11\/", - "wordCount": "1951", + "wordCount": "2115", "datePublished": "2019-11-04T12:20:30+02:00", - "dateModified": "2019-11-13T18:18:24+02:00", + "dateModified": "2019-11-14T09:47:49+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -462,11 +462,38 @@ $ http “http://localhost:8081/solr/statistics/select' -d ‘q=userAgent:/Postgenomic(\s|+)v2/&rows=2’ +

$ curl -s ‘http://localhost:8081/solr/statistics/select' -d ‘q=userAgent:/Postgenomic(\s|+)v2/&rows=2’

+ +

+- I updated the `check-spider-hits.sh` script to use the POST syntax, and I'm evaluating the feasability of including the regex search patterns from the spider agent file, as I had been filtering them out due to differences in PCRE and Solr regex syntax and issues with shell handling
+
+## 2019-11-14
+
+- IWMI sent a few new ORCID identifiers for us to add to our controlled vocabulary
+- I will merge them with our existing list and then resolve their names using my `resolve-orcids.py` script:
+
+
+ +

$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/iwmi-orcids.txt | grep -oE ‘[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}’ | sort | uniq > /tmp/2019-11-14-combined-orcids.txt +$ ./resolve-orcids.py -i /tmp/2019-11-14-combined-orcids.txt -o /tmp/2019-11-14-combined-names.txt -d

+ +

sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)

+ +

$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml ```

diff --git a/docs/sitemap.xml b/docs/sitemap.xml index f39beb64c..adb2797b2 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2019-11-13T18:18:24+02:00 + 2019-11-14T09:47:49+02:00 https://alanorth.github.io/cgspace-notes/ - 2019-11-13T18:18:24+02:00 + 2019-11-14T09:47:49+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2019-11-13T18:18:24+02:00 + 2019-11-14T09:47:49+02:00 https://alanorth.github.io/cgspace-notes/2019-11/ - 2019-11-13T18:18:24+02:00 + 2019-11-14T09:47:49+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2019-11-13T18:18:24+02:00 + 2019-11-14T09:47:49+02:00