CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

August, 2021

2021-08-01

  • Update Docker images on AReS server (linode20) and reboot the server:
# docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
  • I decided to upgrade linode20 from Ubuntu 18.04 to 20.04
  • First running all existing updates, taking some backups, checking for broken packages, and then rebooting:
# apt update && apt dist-upgrade
# apt autoremove && apt autoclean
# check for any packages with residual configs we can purge
# dpkg -l | grep -E '^rc' | awk '{print $2}'
# dpkg -l | grep -E '^rc' | awk '{print $2}' | xargs dpkg -P
# dpkg -C
# dpkg -l > 2021-08-01-linode20-dpkg.txt
# tar -I zstd -cvf 2021-08-01-etc.tar.zst /etc
# reboot
# sed -i 's/bionic/focal/' /etc/apt/sources.list.d/*.list
# do-release-upgrade
  • … but of course it hit the libxcrypt bug
  • I had to get a copy of libcrypt.so.1.1.0 from a working Ubuntu 20.04 system and finish the upgrade manually
# apt install -f
# apt dist-upgrade
# reboot
  • After rebooting I purged all packages with residual configs and cleaned up again:
# dpkg -l | grep -E '^rc' | awk '{print $2}' | xargs dpkg -P
# apt autoremove && apt autoclean

2021-08-02

  • Help Udana with OAI validation on CGSpace

2021-08-03

  • Run fresh re-harvest on AReS

2021-08-05

  • Have a quick call with Mishell Portilla from CIP about a journal article that was flagged as being in a predatory journal (Beall’s List)
    • We agreed to unmap it from RTB’s collection for now, and I asked for advice from Peter and Abenet for what to do in the future
  • A developer from the Alliance asked for access to the CGSpace database so they can make some integration with PowerBI
    • I told them we don’t allow direct database access, and that it would be tricky anyways (that’s what APIs are for!)
  • I’m curious if there are still any requests coming in to CGSpace from the abusive Russian networks
    • I extracted all the unique IPs that nginx processed in the last week:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.6 /var/log/nginx/access.log.7 /var/log/nginx/access.log.8 | grep -E " (200|499) " | grep -v -E "(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)" | awk '{print $1}' | sort | uniq > /tmp/2021-08-05-all-ips.txt
# wc -l /tmp/2021-08-05-all-ips.txt
43428 /tmp/2021-08-05-all-ips.txt
  • Already I can see that the total is much less than during the attack on one weekend last month (over 50,000!)
    • Indeed, now I see that there are no IPs from those networks coming in now:
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/2021-08-05-all-ips.txt -o /tmp/2021-08-05-all-ips.csv
$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624|8100)$' /tmp/2021-08-05-all-ips.csv | csvcut -c ip | sed 1d | sort | uniq > /tmp/2021-08-05-all-ips-to-purge.csv
$ wc -l /tmp/2021-08-05-all-ips-to-purge.csv
0 /tmp/2021-08-05-all-ips-to-purge.csv