2019-10-21 22:09:15 +03:00
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2018-02-11 18:28:23 +02:00
<rss version= "2.0" xmlns:atom= "http://www.w3.org/2005/Atom" >
<channel >
<title > Notes on CGSpace Notes</title>
<link > https://alanorth.github.io/cgspace-notes/categories/notes/</link>
<description > Recent content in Notes on CGSpace Notes</description>
<generator > Hugo -- gohugo.io</generator>
<language > en-us</language>
2020-02-02 17:15:48 +02:00
<lastBuildDate > Sun, 02 Feb 2020 11:56:30 +0200</lastBuildDate>
2018-02-11 18:28:23 +02:00
<atom:link href= "https://alanorth.github.io/cgspace-notes/categories/notes/index.xml" rel= "self" type= "application/rss+xml" />
2020-02-02 17:15:48 +02:00
<item >
<title > February, 2020</title>
<link > https://alanorth.github.io/cgspace-notes/2020-02/</link>
<pubDate > Sun, 02 Feb 2020 11:56:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2020-02/</guid>
<description > < h2 id=" 2020-02-02" > 2020-02-02< /h2>
< ul>
< li> Continue working on porting CGSpace& rsquo;s DSpace 5 code to DSpace 6.3 that I started yesterday
< ul>
< li> Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database< /li>
< li> I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks< /li>
< li> Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff< /li>
< li> The code finally builds and runs with a fresh install< /li>
< /ul>
< /li>
< /ul> </description>
</item>
2020-01-14 20:40:41 +02:00
<item >
<title > January, 2020</title>
<link > https://alanorth.github.io/cgspace-notes/2020-01/</link>
<pubDate > Mon, 06 Jan 2020 10:48:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2020-01/</guid>
<description > < h2 id=" 2020-01-06" > 2020-01-06< /h2>
< ul>
< li> Open < a href=" https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706" > a ticket< /a> with Atmire to request a quote for the upgrade to DSpace 6< /li>
< li> Last week Altmetric responded about the < a href=" https://hdl.handle.net/10568/97087" > item< /a> that had a lower score than than its DOI
< ul>
< li> The score is now linked to the DOI< /li>
< li> Another < a href=" https://handle.hdl.net/10568/91278" > item< /a> that had the same problem in 2019 has now also linked to the score for its DOI< /li>
< li> Another < a href=" https://hdl.handle.net/10568/81236" > item< /a> that had the same problem in 2019 has also been fixed< /li>
< /ul>
< /li>
< /ul>
< h2 id=" 2020-01-07" > 2020-01-07< /h2>
< ul>
< li> Peter Ballantyne highlighted one more WLE < a href=" https://hdl.handle.net/10568/101286" > item< /a> that is missing the Altmetric score that its DOI has
< ul>
< li> The DOI has a score of 259, but the Handle has no score at all< /li>
< li> I < a href=" https://twitter.com/mralanorth/status/1214471427157626881" > tweeted< /a> the CGSpace repository link< /li>
< /ul>
< /li>
< /ul> </description>
</item>
2019-12-01 11:29:49 +02:00
<item >
<title > December, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-12/</link>
<pubDate > Sun, 01 Dec 2019 11:22:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-12/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-12-01" > 2019-12-01< /h2>
2019-12-01 11:29:49 +02:00
< ul>
< li> Upgrade CGSpace (linode18) to Ubuntu 18.04:
< ul>
< li> Check any packages that have residual configs and purge them:< /li>
< li> < !-- raw HTML omitted --> # dpkg -l | grep -E & lsquo;^rc& rsquo; | awk & lsquo;{print $2}& rsquo; | xargs dpkg -P< !-- raw HTML omitted --> < /li>
< li> Make sure all packages are up to date and the package manager is up to date, then reboot:< /li>
< /ul>
< /li>
< /ul>
< pre> < code> # apt update & amp;& amp; apt full-upgrade
# apt-get autoremove & amp;& amp; apt-get autoclean
# dpkg -C
# reboot
< /code> < /pre> </description>
</item>
2019-11-04 16:41:19 +02:00
<item >
<title > November, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-11/</link>
<pubDate > Mon, 04 Nov 2019 12:20:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-11/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-11-04" > 2019-11-04< /h2>
2019-11-04 16:41:19 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
2019-11-04 16:41:19 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:< /li>
< /ul>
< /li>
< /ul>
2019-11-04 16:41:19 +02:00
< pre> < code> # zcat --force /var/log/nginx/*access.log.*.gz | grep -cE & quot;[0-9]{1,2}/Oct/2019& quot;
4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE & quot;[0-9]{1,2}/Oct/2019& quot;
1277694
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> So 4.6 million from XMLUI and another 1.2 million from API requests< /li>
2020-01-27 16:20:44 +02:00
< li> Let& rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):< /li>
2019-11-28 17:30:45 +02:00
< /ul>
2019-11-04 16:41:19 +02:00
< pre> < code> # zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E & quot;[0-9]{1,2}/Oct/2019& quot;
1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E & quot;[0-9]{1,2}/Oct/2019& quot; | grep -c -E & quot;/rest/bitstreams& quot;
106781
2019-11-28 17:30:45 +02:00
< /code> < /pre> </description>
2019-11-04 16:41:19 +02:00
</item>
2019-10-28 13:43:25 +02:00
<item >
<title > CGSpace CG Core v2 Migration</title>
<link > https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/</link>
<pubDate > Mon, 28 Oct 2019 13:27:35 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/</guid>
<description > < p> Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.< /p>
< p> With reference to < a href=" https://agriculturalsemantics.github.io/cg-core/cgcore.html" > CG Core v2 draft standard< /a> by Marie-Angélique as well as < a href=" http://www.dublincore.org/specifications/dublin-core/dcmi-terms/" > DCMI DCTERMS< /a> .< /p> </description>
</item>
<item >
<title > October, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-10/</link>
<pubDate > Tue, 01 Oct 2019 13:20:51 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-10/</guid>
2020-01-27 16:20:44 +02:00
<description > 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script& rsquo;s & ldquo;unneccesary Unicode& rdquo; fix: $ csvcut -c ' id,dc.</description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > September, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-09/</link>
<pubDate > Sun, 01 Sep 2019 10:17:51 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-09/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-09-01" > 2019-09-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning< /li>
2019-11-28 17:30:45 +02:00
< li> Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E & quot;01/Sep/2019:0& quot; | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
2019-11-28 17:30:45 +02:00
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
2019-10-28 13:43:25 +02:00
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E & quot;01/Sep/2019:0& quot; | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
2019-11-28 17:30:45 +02:00
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > August, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-08/</link>
<pubDate > Sat, 03 Aug 2019 12:39:51 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-08/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-08-03" > 2019-08-03< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2020-01-27 16:20:44 +02:00
< li> Look at Bioversity& rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name& hellip;< /li>
2019-10-28 13:43:25 +02:00
< /ul>
2019-12-17 14:49:24 +02:00
< h2 id=" 2019-08-04" > 2019-08-04< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Deploy ORCID identifier updates requested by Bioversity to CGSpace< /li>
< li> Run system updates on CGSpace (linode18) and reboot it
< ul>
< li> Before updating it I checked Solr and verified that all statistics cores were loaded properly& hellip;< /li>
2020-01-27 16:20:44 +02:00
< li> After rebooting, all statistics cores were loaded& hellip; wow, that& rsquo;s lucky.< /li>
2019-11-28 17:30:45 +02:00
< /ul>
< /li>
2019-10-28 13:43:25 +02:00
< li> Run system updates on DSpace Test (linode19) and reboot it< /li>
< /ul> </description>
</item>
<item >
<title > July, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-07/</link>
<pubDate > Mon, 01 Jul 2019 12:13:51 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-07/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-07-01" > 2019-07-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Create an & ldquo;AfricaRice books and book chapters& rdquo; collection on CGSpace for AfricaRice< /li>
< li> Last month Sisay asked why the following & ldquo;most popular& rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
< ul>
< li> < a href=" https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom& amp;time_filter_end_date=01%2F12%2F2018" > DSpace Test< /a> < /li>
< li> < a href=" https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom& amp;time_filter_end_date=01%2F12%2F2018" > CGSpace< /a> < /li>
2019-11-28 17:30:45 +02:00
< /ul>
< /li>
2019-10-28 13:43:25 +02:00
< li> Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community< /li>
< /ul> </description>
</item>
<item >
<title > June, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-06/</link>
<pubDate > Sun, 02 Jun 2019 10:57:51 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-06/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-06-02" > 2019-06-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Merge the < a href=" https://github.com/ilri/DSpace/pull/425" > Solr filterCache< /a> and < a href=" https://github.com/ilri/DSpace/pull/426" > XMLUI ISI journal< /a> changes to the < code> 5_x-prod< /code> branch and deploy on CGSpace< /li>
< li> Run system updates on CGSpace (linode18) and reboot it< /li>
< /ul>
2019-12-17 14:49:24 +02:00
< h2 id=" 2019-06-03" > 2019-06-03< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Skype with Marie-Angélique and Abenet about < a href=" https://agriculturalsemantics.github.io/cg-core/cgcore.html" > CG Core v2< /a> < /li>
< /ul> </description>
</item>
<item >
<title > May, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-05/</link>
<pubDate > Wed, 01 May 2019 07:37:43 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-05/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-05-01" > 2019-05-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace< /li>
< li> A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
< ul>
< li> Apparently if the item is in the < code> workflowitem< /code> table it is submitted to a workflow< /li>
< li> And if it is in the < code> workspaceitem< /code> table it is in the pre-submitted state< /li>
2019-11-28 17:30:45 +02:00
< /ul>
< /li>
< li> The item seems to be in a pre-submitted state, so I tried to delete it from there:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> But after this I tried to delete the item from the XMLUI and it is < em> still< /em> present& hellip;< /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
<item >
<title > April, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-04/</link>
<pubDate > Mon, 01 Apr 2019 09:00:43 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-04/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-04-01" > 2019-04-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
< ul>
< li> They asked if we had plans to enable RDF support in CGSpace< /li>
2019-11-28 17:30:45 +02:00
< /ul>
< /li>
< li> There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
2019-10-28 13:43:25 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!< /li>
< /ul>
< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep ' Spore-192-EN-web.pdf' | grep -E ' (18.196.196.108|18.195.78.144|18.195.218.6)' | awk ' {print $9}' | sort | uniq -c | sort -n | tail -n 5
2019-11-28 17:30:45 +02:00
4432 200
< /code> < /pre> < ul>
< li> In the last two weeks there have been 47,000 downloads of this < em> same exact PDF< /em> by these three IP addresses< /li>
< li> Apply country and region corrections and deletions on DSpace Test and CGSpace:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p ' fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p ' fuuu' -m 231 -f cg.coverage.region -d
2019-11-28 17:30:45 +02:00
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > March, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-03/</link>
<pubDate > Fri, 01 Mar 2019 12:16:30 +0100</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-03/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-03-01" > 2019-03-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2020-01-27 16:20:44 +02:00
< li> I checked IITA& rsquo;s 259 Feb 14 records from last month for duplicates using Atmire& rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good< /li>
2019-10-28 13:43:25 +02:00
< li> I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc& hellip;< /li>
2020-01-27 16:20:44 +02:00
< li> Looking at the other half of Udana& rsquo;s WLE records from 2018-11
2019-10-28 13:43:25 +02:00
< ul>
< li> I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)< /li>
< li> I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items< /li>
< li> Most worryingly, there are encoding errors in the abstracts for eleven items, for example:< /li>
< li> 68.15% <20> 9.45 instead of 68.15% ± 9.45< /li>
< li> 2003<EFBFBD> 2013 instead of 2003– 2013< /li>
2019-11-28 17:30:45 +02:00
< /ul>
< /li>
2019-10-28 13:43:25 +02:00
< li> I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs< /li>
< /ul> </description>
</item>
<item >
<title > February, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-02/</link>
<pubDate > Fri, 01 Feb 2019 21:37:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-02/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-02-01" > 2019-02-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!< /li>
2019-11-28 17:30:45 +02:00
< li> The top IPs before, during, and after this latest alert tonight were:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E & quot;01/Feb/2019:(17|18|19|20|21)& quot; | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
2019-11-28 17:30:45 +02:00
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
< /code> < /pre> < ul>
< li> < code> 85.25.237.71< /code> is the & ldquo;Linguee Bot& rdquo; that I first saw last month< /li>
< li> The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase< /li>
< li> There were just over 3 million accesses in the nginx logs last month:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> # time zcat --force /var/log/nginx/* | grep -cE & quot;[0-9]{1,2}/Jan/2019& quot;
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
2019-11-28 17:30:45 +02:00
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > January, 2019</title>
<link > https://alanorth.github.io/cgspace-notes/2019-01/</link>
<pubDate > Wed, 02 Jan 2019 09:48:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2019-01/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2019-01-02" > 2019-01-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning< /li>
2020-01-27 16:20:44 +02:00
< li> I don& rsquo;t see anything interesting in the web server logs around that time though:< /li>
2019-11-28 17:30:45 +02:00
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E & quot;02/Jan/2019:0(1|2|3)& quot; | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
2019-11-28 17:30:45 +02:00
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > December, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-12/</link>
<pubDate > Sun, 02 Dec 2018 02:09:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-12/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-12-01" > 2018-12-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK< /li>
< li> I manually installed OpenJDK, then removed Oracle JDK, then re-ran the < a href=" http://github.com/ilri/rmg-ansible-public" > Ansible playbook< /a> to update all configuration files, etc< /li>
< li> Then I ran all system updates and restarted the server< /li>
< /ul>
2019-12-17 14:49:24 +02:00
< h2 id=" 2018-12-02" > 2018-12-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another < a href=" https://usn.ubuntu.com/3831-1/" > Ghostscript vulnerability last week< /a> < /li>
< /ul> </description>
</item>
<item >
<title > November, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-11/</link>
<pubDate > Thu, 01 Nov 2018 16:41:30 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-11/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-11-01" > 2018-11-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Finalize AReS Phase I and Phase II ToRs< /li>
< li> Send a note about my < a href=" https://github.com/ilri/dspace-statistics-api" > dspace-statistics-api< /a> to the dspace-tech mailing list< /li>
< /ul>
2019-12-17 14:49:24 +02:00
< h2 id=" 2018-11-03" > 2018-11-03< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage< /li>
< li> Today these are the top 10 IPs:< /li>
< /ul> </description>
</item>
<item >
<title > October, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-10/</link>
<pubDate > Mon, 01 Oct 2018 22:31:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-10/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-10-01" > 2018-10-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items< /li>
2020-01-27 16:20:44 +02:00
< li> I created a GitHub issue to track this < a href=" https://github.com/ilri/DSpace/issues/389" > #389< /a> , because I& rsquo;m super busy in Nairobi right now< /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
<item >
<title > September, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-09/</link>
<pubDate > Sun, 02 Sep 2018 09:55:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-09/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-09-02" > 2018-09-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> New < a href=" https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5" > PostgreSQL JDBC driver version 42.2.5< /a> < /li>
2020-01-27 16:20:44 +02:00
< li> I& rsquo;ll update the DSpace role in our < a href=" https://github.com/ilri/rmg-ansible-public" > Ansible infrastructure playbooks< /a> and run the updated playbooks on CGSpace and DSpace Test< /li>
< li> Also, I& rsquo;ll re-run the < code> postgresql< /code> tasks because the custom PostgreSQL variables are dynamic according to the system& rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month< /li>
< li> I& rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I& rsquo;m getting those autowire errors in Tomcat 8.5.30 again:< /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
<item >
<title > August, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-08/</link>
<pubDate > Wed, 01 Aug 2018 11:52:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-08/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-08-01" > 2018-08-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> DSpace Test had crashed at some point yesterday morning and I see the following in < code> dmesg< /code> :< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> [Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight< /li>
2020-01-27 16:20:44 +02:00
< li> From the DSpace log I see that eventually Solr stopped responding, so I guess the < code> java< /code> process that was OOM killed above was Tomcat& rsquo;s< /li>
< li> I& rsquo;m not sure why Tomcat didn& rsquo;t crash with an OutOfMemoryError& hellip;< /li>
2019-11-28 17:30:45 +02:00
< li> Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core< /li>
2020-01-27 16:20:44 +02:00
< li> The server only has 8GB of RAM so we& rsquo;ll eventually need to upgrade to a larger one because we& rsquo;ll start starving the OS, PostgreSQL, and command line batch processes< /li>
2019-11-28 17:30:45 +02:00
< li> I ran all system updates on DSpace Test and rebooted it< /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
<item >
<title > July, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-07/</link>
<pubDate > Sun, 01 Jul 2018 12:56:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-07/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-07-01" > 2018-07-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> $ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> During the < code> mvn package< /code> stage on the 5.8 branch I kept getting issues with java running out of memory:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> There is insufficient memory for the Java Runtime Environment to continue.
2019-11-28 17:30:45 +02:00
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > June, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-06/</link>
<pubDate > Mon, 04 Jun 2018 19:49:54 -0700</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-06/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-06-04" > 2018-06-04< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Test the < a href=" https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560" > DSpace 5.8 module upgrades from Atmire< /a> (< a href=" https://github.com/ilri/DSpace/pull/378" > #378< /a> )
< ul>
2020-01-27 16:20:44 +02:00
< li> There seems to be a problem with the CUA and L& amp;R versions in < code> pom.xml< /code> because they are using SNAPSHOT and it doesn& rsquo;t build< /li>
2019-11-28 17:30:45 +02:00
< /ul>
< /li>
2019-10-28 13:43:25 +02:00
< li> I added the new CCAFS Phase II Project Tag < code> PII-FP1_PACCA2< /code> and merged it into the < code> 5_x-prod< /code> branch (< a href=" https://github.com/ilri/DSpace/pull/379" > #379< /a> )< /li>
2019-11-28 17:30:45 +02:00
< li> I proofed and tested the ILRI author corrections that Peter sent back to me this week:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> $ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p ' fuuu' -f dc.contributor.author -t correct -m 3 -n
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in < a href=" https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/" > March, 2018< /a> < /li>
< li> Time to index ~70,000 items on CGSpace:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s
user 8m5.056s
sys 2m7.289s
2019-11-28 17:30:45 +02:00
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > May, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-05/</link>
<pubDate > Tue, 01 May 2018 16:43:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-05/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-05-01" > 2018-05-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
< ul>
2019-11-28 17:30:45 +02:00
< li> http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E< /li>
< li> http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E< /li>
< /ul>
< /li>
2019-10-28 13:43:25 +02:00
< li> Then I reduced the JVM heap size from 6144 back to 5120m< /li>
< li> Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the < a href=" https://github.com/ilri/rmg-ansible-public" > Ansible infrastructure scripts< /a> to support hosts choosing which distribution they want to use< /li>
< /ul> </description>
</item>
<item >
<title > April, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-04/</link>
<pubDate > Sun, 01 Apr 2018 16:13:54 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-04/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-04-01" > 2018-04-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2020-01-27 16:20:44 +02:00
< li> I tried to test something on DSpace Test but noticed that it& rsquo;s down since god knows when< /li>
2019-10-28 13:43:25 +02:00
< li> Catalina logs at least show some memory errors yesterday:< /li>
< /ul> </description>
</item>
<item >
<title > March, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-03/</link>
<pubDate > Fri, 02 Mar 2018 16:07:54 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-03/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-03-02" > 2018-03-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Export a CSV of the IITA community metadata for Martin Mueller< /li>
< /ul> </description>
</item>
<item >
<title > February, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-02/</link>
<pubDate > Thu, 01 Feb 2018 16:28:54 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-02/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-02-01" > 2018-02-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Peter gave feedback on the < code> dc.rights< /code> proof of concept that I had sent him last week< /li>
2020-01-27 16:20:44 +02:00
< li> We don& rsquo;t need to distinguish between internal and external works, so that makes it just a simple list< /li>
2019-10-28 13:43:25 +02:00
< li> Yesterday I figured out how to monitor DSpace sessions using JMX< /li>
2020-01-27 16:20:44 +02:00
< li> I copied the logic in the < code> jmx_tomcat_dbpools< /code> provided by Ubuntu& rsquo;s < code> munin-plugins-java< /code> package and used the stuff I discovered about JMX < a href=" https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/" > in 2018-01< /a> < /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
<item >
<title > January, 2018</title>
<link > https://alanorth.github.io/cgspace-notes/2018-01/</link>
<pubDate > Tue, 02 Jan 2018 08:35:54 -0800</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2018-01/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2018-01-02" > 2018-01-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time< /li>
2020-01-27 16:20:44 +02:00
< li> I didn& rsquo;t get any load alerts from Linode and the REST and XMLUI logs don& rsquo;t show anything out of the ordinary< /li>
2019-10-28 13:43:25 +02:00
< li> The nginx logs show HTTP 200s until < code> 02/Jan/2018:11:27:17 +0000< /code> when Uptime Robot got an HTTP 500< /li>
< li> In dspace.log around that time I see many errors like & ldquo;Client closed the connection before file download was complete& rdquo;< /li>
2019-11-28 17:30:45 +02:00
< li> And just before that I see this:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> Ah hah! So the pool was actually empty!< /li>
2020-01-27 16:20:44 +02:00
< li> I need to increase that, let& rsquo;s try to bump it up from 50 to 75< /li>
< li> After that one client got an HTTP 499 but then the rest were HTTP 200, so I don& rsquo;t know what the hell Uptime Robot saw< /li>
2019-11-28 17:30:45 +02:00
< li> I notice this error quite a few times in dspace.log:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> 2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse ' dateIssued_keyword:[1976+TO+1979]' : Encountered & quot; & quot;]& quot; & quot;] & quot;& quot; at line 1, column 32.
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> And there are many of these errors every day for the past month:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> $ grep -c & quot;Error while searching for sidebar facets& quot; dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
dspace.log.2017-11-24:11
dspace.log.2017-11-25:0
dspace.log.2017-11-26:1
dspace.log.2017-11-27:7
dspace.log.2017-11-28:21
dspace.log.2017-11-29:31
dspace.log.2017-11-30:15
dspace.log.2017-12-01:15
dspace.log.2017-12-02:20
dspace.log.2017-12-03:38
dspace.log.2017-12-04:65
dspace.log.2017-12-05:43
dspace.log.2017-12-06:72
dspace.log.2017-12-07:27
dspace.log.2017-12-08:15
dspace.log.2017-12-09:29
dspace.log.2017-12-10:35
dspace.log.2017-12-11:20
dspace.log.2017-12-12:44
dspace.log.2017-12-13:36
dspace.log.2017-12-14:59
dspace.log.2017-12-15:104
dspace.log.2017-12-16:53
dspace.log.2017-12-17:66
dspace.log.2017-12-18:83
dspace.log.2017-12-19:101
dspace.log.2017-12-20:74
dspace.log.2017-12-21:55
dspace.log.2017-12-22:66
dspace.log.2017-12-23:50
dspace.log.2017-12-24:85
dspace.log.2017-12-25:62
dspace.log.2017-12-26:49
dspace.log.2017-12-27:30
dspace.log.2017-12-28:54
dspace.log.2017-12-29:68
dspace.log.2017-12-30:89
dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
2020-01-27 16:20:44 +02:00
< li> Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let& rsquo;s Encrypt if it& rsquo;s just a handful of domains< /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
<item >
<title > December, 2017</title>
<link > https://alanorth.github.io/cgspace-notes/2017-12/</link>
<pubDate > Fri, 01 Dec 2017 13:53:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2017-12/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2017-12-01" > 2017-12-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> Uptime Robot noticed that CGSpace went down< /li>
< li> The logs say & ldquo;Timeout waiting for idle object& rdquo;< /li>
< li> PostgreSQL activity says there are 115 connections currently< /li>
< li> The list of connections to XMLUI and REST API for today:< /li>
< /ul> </description>
</item>
<item >
<title > November, 2017</title>
<link > https://alanorth.github.io/cgspace-notes/2017-11/</link>
<pubDate > Thu, 02 Nov 2017 09:37:54 +0200</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2017-11/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2017-11-01" > 2017-11-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
< li> The CORE developers responded to say they are looking into their bot not respecting our robots.txt< /li>
< /ul>
2019-12-17 14:49:24 +02:00
< h2 id=" 2017-11-02" > 2017-11-02< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> Today there have been no hits by CORE and no alerts from Linode (coincidence?)< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> # grep -c & quot;CORE& quot; /var/log/nginx/access.log
0
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
< li> Generate list of authors on CGSpace for Peter to go through and correct:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = ' contributor' and qualifier = ' author' ) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
2019-11-28 17:30:45 +02:00
< /code> < /pre> </description>
2019-10-28 13:43:25 +02:00
</item>
<item >
<title > October, 2017</title>
<link > https://alanorth.github.io/cgspace-notes/2017-10/</link>
<pubDate > Sun, 01 Oct 2017 08:07:54 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/2017-10/</guid>
2019-12-17 14:49:24 +02:00
<description > < h2 id=" 2017-10-01" > 2017-10-01< /h2>
2019-10-28 13:43:25 +02:00
< ul>
2019-11-28 17:30:45 +02:00
< li> Peter emailed to point out that many items in the < a href=" https://cgspace.cgiar.org/handle/10568/2703" > ILRI archive collection< /a> have multiple handles:< /li>
< /ul>
2019-10-28 13:43:25 +02:00
< pre> < code> http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
2019-11-28 17:30:45 +02:00
< /code> < /pre> < ul>
2020-01-27 16:20:44 +02:00
< li> There appears to be a pattern but I& rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine< /li>
2019-11-28 17:30:45 +02:00
< li> Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections< /li>
2019-10-28 13:43:25 +02:00
< /ul> </description>
</item>
2018-02-11 18:28:23 +02:00
<item >
<title > CGIAR Library Migration</title>
<link > https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</link>
<pubDate > Mon, 18 Sep 2017 16:38:35 +0300</pubDate>
<guid > https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</guid>
2018-11-08 09:02:20 +02:00
<description > < p> Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called < em> CGIAR System Organization< /em> .< /p> </description>
2018-02-11 18:28:23 +02:00
</item>
</channel>
</rss>