Documenting day-to-day work on the CGSpace repository.
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
5_x-prod
branch and deploy on CGSpaceworkflowitem
table it is submitted to a workflowworkspaceitem
table it is in the pre-submitted stateThe item seems to be in a pre-submitted state, so I tried to delete it from there:
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present…
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
The top IPs before, during, and after this latest alert tonight were:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
85.25.237.71
is the “Linguee Bot” that I first saw last month
The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase
There were just over 3 million accesses in the nginx logs last month:
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
I don’t see anything interesting in the web server logs around that time though:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11