diff --git a/content/posts/2020-04.md b/content/posts/2020-04.md index 441440130..24540184b 100644 --- a/content/posts/2020-04.md +++ b/content/posts/2020-04.md @@ -173,5 +173,64 @@ dspace=# UPDATE metadatavalue SET text_value='Knight-Jones, Theodore J.D.' WHERE - Atmire responded to some of the issues I raised earlier this week about the DSpace 6 pull request - They said they don't think the glyphicon encoding issue is due to their changes, but I built a new clean version of the vanilla `6_x-dev` branch from before their pull request and it *does not* have the encoding issue in the Mirage 2 header trails - Also, they said we need to use something called `AtomicStatisticsUpdateCLI` to do the Solr legacy integer ID to UUID conversion so I asked for more information about that workflow - + +## 2020-04-20 + +- Looking into a high rate of outgoing bandwidth from yesterday on CGSpace (linode18): + +``` +# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Apr/2020:0[6789]" | goaccess --log-format=COMBINED - +``` + +- One host in Russia (91.241.19.70) download 23GiB over those few hours in the morning + - It looks like all the requests were for one single item's bitstreams: + +``` +# grep -c 91.241.19.70 /var/log/nginx/access.log.1 +8900 +# grep 91.241.19.70 /var/log/nginx/access.log.1 | grep -c '10568/35187' +8900 +``` + +- I thought the host might have been Yandex misbehaving, but its user agent is: + +``` +Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_3; nl-nl) AppleWebKit/527 (KHTML, like Gecko) Version/3.1.1 Safari/525.20 +``` + +- I will purge that IP from the Solr statistics using my `check-spider-ip-hits.sh` script: + +``` +$ ./check-spider-ip-hits.sh -d -f /tmp/ip -p +(DEBUG) Using spider IPs file: /tmp/ip +(DEBUG) Checking for hits from spider IP: 91.241.19.70 +Purging 8909 hits from 91.241.19.70 in statistics + +Total number of bot hits purged: 8909 +``` + +- While investigating that I noticed ORCID identifiers missing from a few authors names, so I added them with my `add-orcid-identifiers.py` script: + +``` +$ ./add-orcid-identifiers-csv.py -i 2020-04-20-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d +``` + +- The contents of `2020-04-20-add-orcids.csv` was: + +``` +dc.contributor.author,cg.creator.id +"Schut, Marc","Marc Schut: 0000-0002-3361-4581" +"Schut, M.","Marc Schut: 0000-0002-3361-4581" +"Kamau, G.","Geoffrey Kamau: 0000-0002-6995-4801" +"Kamau, G","Geoffrey Kamau: 0000-0002-6995-4801" +"Triomphe, Bernard","Bernard Triomphe: 0000-0001-6657-3002" +"Waters-Bayer, Ann","Ann Waters-Bayer: 0000-0003-1887-7903" +"Klerkx, Laurens","Laurens Klerkx: 0000-0002-1664-886X" +``` + +- I confirmed some of the authors' names from the report itself, then by looking at their profiles on ORCID.org +- Add new ILRI subject "COVID19" to the `5_x-prod` branch +- Add new CCAFS Phase II project tags to the `5_x-prod` branch +- I will deploy these to CGSpace in the next few days + diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html index 2726cf8a6..5bc3992f7 100644 --- a/docs/2020-04/index.html +++ b/docs/2020-04/index.html @@ -25,7 +25,7 @@ On the same note, the one item Abenet pointed out last week now has a donut with - + @@ -55,9 +55,9 @@ On the same note, the one item Abenet pointed out last week now has a donut with "@type": "BlogPosting", "headline": "April, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-04/", - "wordCount": "1401", + "wordCount": "1660", "datePublished": "2020-04-02T10:53:24+03:00", - "dateModified": "2020-04-14T20:01:06+03:00", + "dateModified": "2020-04-17T19:40:30+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -308,6 +308,56 @@ $ podman start artifactory +

2020-04-20

+ +
# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Apr/2020:0[6789]" | goaccess --log-format=COMBINED -
+
+
# grep -c 91.241.19.70 /var/log/nginx/access.log.1
+8900
+# grep 91.241.19.70 /var/log/nginx/access.log.1 | grep -c '10568/35187'
+8900
+
+
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_3; nl-nl) AppleWebKit/527  (KHTML, like Gecko) Version/3.1.1 Safari/525.20
+
+
$ ./check-spider-ip-hits.sh -d -f /tmp/ip -p
+(DEBUG) Using spider IPs file: /tmp/ip
+(DEBUG) Checking for hits from spider IP: 91.241.19.70
+Purging 8909 hits from 91.241.19.70 in statistics
+
+Total number of bot hits purged: 8909
+
+
$ ./add-orcid-identifiers-csv.py -i 2020-04-20-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
+
+
dc.contributor.author,cg.creator.id
+"Schut, Marc","Marc Schut: 0000-0002-3361-4581"
+"Schut, M.","Marc Schut: 0000-0002-3361-4581"
+"Kamau, G.","Geoffrey Kamau: 0000-0002-6995-4801"
+"Kamau, G","Geoffrey Kamau: 0000-0002-6995-4801"
+"Triomphe, Bernard","Bernard Triomphe: 0000-0001-6657-3002"
+"Waters-Bayer, Ann","Ann Waters-Bayer: 0000-0003-1887-7903"
+"Klerkx, Laurens","Laurens Klerkx: 0000-0002-1664-886X"
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 9841a41a0..677623a88 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/2020-04/ - 2020-04-14T20:01:06+03:00 + 2020-04-17T19:40:30+03:00 https://alanorth.github.io/cgspace-notes/categories/ - 2020-04-14T20:01:06+03:00 + 2020-04-17T19:40:30+03:00 https://alanorth.github.io/cgspace-notes/ - 2020-04-14T20:01:06+03:00 + 2020-04-17T19:40:30+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-04-14T20:01:06+03:00 + 2020-04-17T19:40:30+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-04-14T20:01:06+03:00 + 2020-04-17T19:40:30+03:00