From 184a3e38a8b71a3a32b90f8a09fb35e1efb8d355 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 12 Jan 2021 13:41:01 +0200 Subject: [PATCH] Add notes for 2021-01-12 --- content/posts/2021-01.md | 52 +++++++++++++++++++ docs/2021-01/index.html | 67 +++++++++++++++++++++++-- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/sitemap.xml | 10 ++-- 23 files changed, 140 insertions(+), 29 deletions(-) diff --git a/content/posts/2021-01.md b/content/posts/2021-01.md index 5b147db5d..fea4ab47b 100644 --- a/content/posts/2021-01.md +++ b/content/posts/2021-01.md @@ -122,6 +122,7 @@ java.lang.UnsupportedOperationException - There is apparently [a bug](https://jira.lyrasis.org/browse/DS-3914) in DSpace 6.x that makes community-filiator not work - There is [a patch](https://github.com/DSpace/DSpace/pull/2178) for the as-of-yet unreleased DSpace 6.4 so I will try that + - I tested the patch on DSpace Test and it worked, so I will do the same on CGSpace tomorrow - Udana had asked about exporting IWMI's community on CGSpace, but we don't want to give him super admin permissions to do that - I suggested that he use AReS, but there are some fields missing because we don't harvest them all - I added a few more fields to the configuration and will start a fresh harvest. @@ -131,6 +132,57 @@ java.lang.UnsupportedOperationException ```console $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp' # start indexing in AReS +... after ten hours +$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty' +{ + "count" : 100411, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + } +} +$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}' +$ curl -XDELETE 'http://localhost:9200/openrxv-items' +$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items +$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp' ``` +- Looking over the last month of Solr stats I see a familiar bot that *should* have been marked as a bot months ago: + +> Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com) + +- There are 51,961 hits from this bot on 64.62.202.71 and 64.62.202.73 + - Ah! Actually I added the bot pattern to the Tomcat Crawler Session Manager Valve, which mitigated the abuse of Tomcat sessions: + +```console +$ cat log/dspace.log.2020-12-2* | grep -E 'session_id=[A-Z0-9]{32}:ip_addr=64.62.202.71' | sort | uniq | wc -l +0 +``` + +- So now I should really add it to the DSpace spider agent list so it doesn't create Solr hits + - I added it to the "ilri" lists of spider agent patterns +- I purged the existing hits using my `check-spider-ip-hits.sh` script: + +```console +$ ./check-spider-ip-hits.sh -d -f /tmp/ips -s http://localhost:8081/solr -s statistics -p +``` + +## 2021-01-11 + +- The AReS indexing finished this morning and I moved the `openrxv-items-temp` core to `openrxv-items` (see above) + - I sorted the explorer results by Altmetric attention score and I see a few new ones on the top so I think the recent tweeting of Handles by Peter and myself worked +- I deployed the community-filiator fix on CGSpace and moved the Gender Platform community to the top level of CGSpace: + +```console +$ dspace community-filiator --remove --parent=10568/66598 --child=10568/106605 +``` + +## 2021-01-12 + +- IWMI is really pressuring us to have a periodic CSV export of their community + - I decided to write a systemd timer to use `dspace metadata-export` every week, and made an nginx alias to make it available [publicly](https://cgspace.cgiar.org/iwmi.csv) + - It is part of the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public) that I use to provision the servers + diff --git a/docs/2021-01/index.html b/docs/2021-01/index.html index 964dfa57f..a4f91790b 100644 --- a/docs/2021-01/index.html +++ b/docs/2021-01/index.html @@ -27,7 +27,7 @@ For example, this item has 51 views on CGSpace, but 0 on AReS - + @@ -60,9 +60,9 @@ For example, this item has 51 views on CGSpace, but 0 on AReS "@type": "BlogPosting", "headline": "January, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-01/", - "wordCount": "1025", + "wordCount": "1347", "datePublished": "2021-01-03T10:13:54+02:00", - "dateModified": "2021-01-05T19:56:15+02:00", + "dateModified": "2021-01-10T16:15:04+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -283,6 +283,7 @@ java.lang.UnsupportedOperationException
  • There is apparently a bug in DSpace 6.x that makes community-filiator not work
  • Udana had asked about exporting IWMI’s community on CGSpace, but we don’t want to give him super admin permissions to do that @@ -299,7 +300,65 @@ java.lang.UnsupportedOperationException
    $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
     # start indexing in AReS
    -
    +... after ten hours +$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty' +{ + "count" : 100411, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + } +} +$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}' +$ curl -XDELETE 'http://localhost:9200/openrxv-items' +$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items +$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp' + +
    +

    Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com)

    +
    + +
    $ cat log/dspace.log.2020-12-2* | grep -E 'session_id=[A-Z0-9]{32}:ip_addr=64.62.202.71' | sort | uniq | wc -l
    +0
    +
    +
    $ ./check-spider-ip-hits.sh -d -f /tmp/ips -s http://localhost:8081/solr -s statistics -p
    +

    2021-01-11

    + +
    $ dspace community-filiator --remove --parent=10568/66598 --child=10568/106605
    +

    2021-01-12

    + + diff --git a/docs/categories/index.html b/docs/categories/index.html index 1250ab222..2ab74f69c 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 75fa75c20..3e90dffce 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 62f442130..51716ddcd 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 241f227c7..fbc4443e8 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 3fa291308..4488c24ba 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index cff21db2e..42e3596f9 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index a5ce7c71e..d0d62d1ac 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index bd3eaaf4e..ea73f2219 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index b7531d18e..85131e0a4 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index b8a9229c4..f67657f89 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index a8ca332e0..aeda74df9 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 52aafa3f2..c19af9f1e 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 6613c8145..f58c50a6b 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 056382bac..57602106d 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 12ac175f4..46c3c7a0d 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 90c3b51e4..bce88e3d9 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index d84a1c4f0..be6488f6c 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index d3e2542ba..0bad126e4 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index afe7be688..888ee7dac 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index f82ae8a2e..1d1533e48 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 4999144b6..d85829545 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2021-01-05T19:56:15+02:00 + 2021-01-10T16:15:04+02:00 https://alanorth.github.io/cgspace-notes/ - 2021-01-05T19:56:15+02:00 + 2021-01-10T16:15:04+02:00 https://alanorth.github.io/cgspace-notes/2021-01/ - 2021-01-05T19:56:15+02:00 + 2021-01-10T16:15:04+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2021-01-05T19:56:15+02:00 + 2021-01-10T16:15:04+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2021-01-05T19:56:15+02:00 + 2021-01-10T16:15:04+02:00