CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

June, 2020

2020-06-01

  • I tried to run the AtomicStatisticsUpdateCLI CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
    • I sent Atmire the dspace.log from today and told them to log into the server to debug the process
  • In other news, I checked the statistics API on DSpace 6 and it’s working
  • I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
Read more →

May, 2020

2020-05-02

  • Peter said that CTA is having problems submitting an item to CGSpace
    • Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again
    • I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)
Read more →

April, 2020

2020-04-02

  • Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
    • I updated the fifty-eight existing items on CGSpace
  • Looking into the items Udana had asked about last week that were missing Altmetric donuts:
  • On the same note, the one item Abenet pointed out last week now has a donut with score of 104 after I tweeted it last week
Read more →

February, 2020

2020-02-02

  • Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
    • Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database
    • I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks
    • Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff
    • The code finally builds and runs with a fresh install
Read more →

January, 2020

2020-01-06

  • Open a ticket with Atmire to request a quote for the upgrade to DSpace 6
  • Last week Altmetric responded about the item that had a lower score than than its DOI
    • The score is now linked to the DOI
    • Another item that had the same problem in 2019 has now also linked to the score for its DOI
    • Another item that had the same problem in 2019 has also been fixed

2020-01-07

  • Peter Ballantyne highlighted one more WLE item that is missing the Altmetric score that its DOI has
    • The DOI has a score of 259, but the Handle has no score at all
    • I tweeted the CGSpace repository link
Read more →

December, 2019

2019-12-01

  • Upgrade CGSpace (linode18) to Ubuntu 18.04:
    • Check any packages that have residual configs and purge them:
    • # dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P
    • Make sure all packages are up to date and the package manager is up to date, then reboot:
# apt update && apt full-upgrade
# apt-get autoremove && apt-get autoclean
# dpkg -C
# reboot
Read more →

November, 2019

2019-11-04

  • Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
    • I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:
# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
1277694
  • So 4.6 million from XMLUI and another 1.2 million from API requests
  • Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):
# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
1183456 
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
106781
Read more →

October, 2019

2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix: $ csvcut -c 'id,dc. Read more →

September, 2019

2019-09-01

  • Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
  • Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    440 17.58.101.255
    441 157.55.39.101
    485 207.46.13.43
    728 169.60.128.125
    730 207.46.13.108
    758 157.55.39.9
    808 66.160.140.179
    814 207.46.13.212
   2472 163.172.71.23
   6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
     33 2a01:7e00::f03c:91ff:fe16:fcb
     57 3.83.192.124
     57 3.87.77.25
     57 54.82.1.8
    822 2a01:9cc0:47:1:1a:4:0:2
   1223 45.5.184.72
   1633 172.104.229.92
   5112 205.186.128.185
   7249 2a01:7e00::f03c:91ff:fe18:7396
   9124 45.5.186.2
Read more →