September, 2019
2019-09-01
- Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 440 17.58.101.255 441 157.55.39.101 485 207.46.13.43 728 169.60.128.125 730 207.46.13.108 758 157.55.39.9 808 66.160.140.179 814 207.46.13.212 2472 163.172.71.23 6092 3.94.211.189 # zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 33 2a01:7e00::f03c:91ff:fe16:fcb 57 3.83.192.124 57 3.87.77.25 57 54.82.1.8 822 2a01:9cc0:47:1:1a:4:0:2 1223 45.5.184.72 1633 172.104.229.92 5112 205.186.128.185 7249 2a01:7e00::f03c:91ff:fe18:7396 9124 45.5.186.2
3.94.211.189
is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503163.172.71.23
is some IP on Online SAS in France and its user agent is:Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
It actually got mostly HTTP 200 responses:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c 1775 200 703 499 72 503
And it was mostly requesting Discover pages:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 2350 discover 71 handle
I’m not sure why the outbound traffic rate was so high…
2019-09-02
- Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August
- I told them to check the temporary collection on DSpace Test where I uploaded the 1,427 items so they can see how it will look
- Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck)
- Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification
2019-09-10
- Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges!
- Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807)
- These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out
- It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the accession date as a filter to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file)
- Continue working on CG Core v2 migration, focusing on the crosswalk mappings
- I think we can skip the MODS crosswalk for now because it is only used in AIP exports that are meant for non-DSpace systems
- We should probably do the QDC crosswalk as well as those in
xhtml-head-item.properties
… - Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see
dspace/config/crosswalks/oai/*.xsl
) - In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc