Add notes for 2021-01-12

This commit is contained in:
2021-01-12 13:41:01 +02:00
parent 5c4c72d79b
commit 184a3e38a8
23 changed files with 140 additions and 29 deletions

View File

@ -122,6 +122,7 @@ java.lang.UnsupportedOperationException
- There is apparently [a bug](https://jira.lyrasis.org/browse/DS-3914) in DSpace 6.x that makes community-filiator not work
- There is [a patch](https://github.com/DSpace/DSpace/pull/2178) for the as-of-yet unreleased DSpace 6.4 so I will try that
- I tested the patch on DSpace Test and it worked, so I will do the same on CGSpace tomorrow
- Udana had asked about exporting IWMI's community on CGSpace, but we don't want to give him super admin permissions to do that
- I suggested that he use AReS, but there are some fields missing because we don't harvest them all
- I added a few more fields to the configuration and will start a fresh harvest.
@ -131,6 +132,57 @@ java.lang.UnsupportedOperationException
```console
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
# start indexing in AReS
... after ten hours
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
{
"count" : 100411,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
$ curl -XDELETE 'http://localhost:9200/openrxv-items'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
```
- Looking over the last month of Solr stats I see a familiar bot that *should* have been marked as a bot months ago:
> Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com)
- There are 51,961 hits from this bot on 64.62.202.71 and 64.62.202.73
- Ah! Actually I added the bot pattern to the Tomcat Crawler Session Manager Valve, which mitigated the abuse of Tomcat sessions:
```console
$ cat log/dspace.log.2020-12-2* | grep -E 'session_id=[A-Z0-9]{32}:ip_addr=64.62.202.71' | sort | uniq | wc -l
0
```
- So now I should really add it to the DSpace spider agent list so it doesn't create Solr hits
- I added it to the "ilri" lists of spider agent patterns
- I purged the existing hits using my `check-spider-ip-hits.sh` script:
```console
$ ./check-spider-ip-hits.sh -d -f /tmp/ips -s http://localhost:8081/solr -s statistics -p
```
## 2021-01-11
- The AReS indexing finished this morning and I moved the `openrxv-items-temp` core to `openrxv-items` (see above)
- I sorted the explorer results by Altmetric attention score and I see a few new ones on the top so I think the recent tweeting of Handles by Peter and myself worked
- I deployed the community-filiator fix on CGSpace and moved the Gender Platform community to the top level of CGSpace:
```console
$ dspace community-filiator --remove --parent=10568/66598 --child=10568/106605
```
## 2021-01-12
- IWMI is really pressuring us to have a periodic CSV export of their community
- I decided to write a systemd timer to use `dspace metadata-export` every week, and made an nginx alias to make it available [publicly](https://cgspace.cgiar.org/iwmi.csv)
- It is part of the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public) that I use to provision the servers
<!-- vim: set sw=2 ts=2: -->