CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

April, 2019

2019-04-01

  • Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
    • They asked if we had plans to enable RDF support in CGSpace
  • There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
    • I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
   4432 200
  • In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
  • Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d

2019-04-02

  • CTA says the Amazon IPs are AWS gateways for real user traffic