2019-05-01
- Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace
- A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
- Apparently if the item is in the
workflowitem
table it is submitted to a workflow
- And if it is in the
workspaceitem
table it is in the pre-submitted state
The item seems to be in a pre-submitted state, so I tried to delete it from there:
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present…
Read more →
2019-04-01
- Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
- They asked if we had plans to enable RDF support in CGSpace
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
Read more →
2019-03-01
- I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
- I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
- Looking at the other half of Udana’s WLE records from 2018-11
- I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
- I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
- Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
- 68.15% � 9.45 instead of 68.15% ± 9.45
- 2003�2013 instead of 2003–2013
- I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
Read more →
2019-02-01
- Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!
The top IPs before, during, and after this latest alert tonight were:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
85.25.237.71
is the “Linguee Bot” that I first saw last month
The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase
There were just over 3 million accesses in the nginx logs last month:
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
Read more →
2019-01-02
- Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
I don’t see anything interesting in the web server logs around that time though:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
Read more →
2018-12-01
- Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
- I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
- Then I ran all system updates and restarted the server
2018-12-02
Read more →
2018-11-01
- Finalize AReS Phase I and Phase II ToRs
- Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03
- Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
- Today these are the top 10 IPs:
Read more →
2018-10-01
- Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
- I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
Read more →
2018-09-02
- New PostgreSQL JDBC driver version 42.2.5
- I’ll update the DSpace role in our Ansible infrastructure playbooks and run the updated playbooks on CGSpace and DSpace Test
- Also, I’ll re-run the
postgresql
tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
- I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
Read more →
2018-08-01
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg
:
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
From the DSpace log I see that eventually Solr stopped responding, so I guess the java
process that was OOM killed above was Tomcat’s
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it
Read more →