CGSpace Notes

May, 2019

Wed May 01, 2019 by Alan Orth in Notes

2019-05-01

Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace
A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
- Apparently if the item is in the workflowitem table it is submitted to a workflow
- And if it is in the workspaceitem table it is in the pre-submitted state
The item seems to be in a pre-submitted state, so I tried to delete it from there:
```
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
```
But after this I tried to delete the item from the XMLUI and it is still present…

April, 2019

Mon Apr 01, 2019 by Alan Orth in Notes

2019-04-01

Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
- They asked if we had plans to enable RDF support in CGSpace
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
- I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
```
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses

Apply country and region corrections and deletions on DSpace Test and CGSpace:

$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d

March, 2019

Fri Mar 01, 2019 by Alan Orth in Notes

2019-03-01

I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
Looking at the other half of Udana’s WLE records from 2018-11
- I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
- I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
- Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
- 68.15% � 9.45 instead of 68.15% ± 9.45
- 2003�2013 instead of 2003–2013
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs

February, 2019

Fri Feb 01, 2019 by Alan Orth in Notes

2019-02-01

Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!

The top IPs before, during, and after this latest alert tonight were:

# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71

85.25.237.71 is the “Linguee Bot” that I first saw last month
The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase

There were just over 3 million accesses in the nginx logs last month:

# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243

real    0m19.873s
user    0m22.203s
sys     0m1.979s

January, 2019

Wed Jan 02, 2019 by Alan Orth in Notes

2019-01-02

Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning

I don’t see anything interesting in the web server logs around that time though:

# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
 92 40.77.167.4
 99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11

December, 2018

Sun Dec 02, 2018 by Alan Orth in Notes

2018-12-01

Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
Then I ran all system updates and restarted the server

2018-12-02

I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week

November, 2018

Thu Nov 01, 2018 by Alan Orth in Notes

2018-11-01

Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list

2018-11-03

Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:

October, 2018

Mon Oct 01, 2018 by Alan Orth in Notes

2018-10-01

Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now

September, 2018

Sun Sep 02, 2018 by Alan Orth in Notes

2018-09-02

New PostgreSQL JDBC driver version 42.2.5
I’ll update the DSpace role in our Ansible infrastructure playbooks and run the updated playbooks on CGSpace and DSpace Test
Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:

August, 2018

Wed Aug 01, 2018 by Alan Orth in Notes

2018-08-01

DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:

[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat’s
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it