diff --git a/content/posts/2023-12.md b/content/posts/2023-12.md
index 8e055855d..1c0c0d55f 100644
--- a/content/posts/2023-12.md
+++ b/content/posts/2023-12.md
@@ -73,4 +73,40 @@ $ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a export -o /tmp/
- Export CGSpace to check for missing Initiative collection mappings
- Start a harvest on AReS
+## 2023-12-04
+
+- Send a message to Altmetric support because the item IWMI highlighted last month still doesn't show the attention score for the Handle after I tweeted it several times weeks ago
+- Spent some time writing a Python script to fix the literal MaxMind City JSON objects in our Solr statistics
+ - There are about 1.6 million of these, so I exported them using solr-import-export-json with the query `city:com*` but ended up finding many that have missing bundles, container bitstreams, etc:
+
+```
+city:com* AND -bundleName:[* TO *] AND -containerBitstream:[* TO *] AND -file_id:[* TO *] AND -owningItem:[* TO *] AND -version_id:[* TO *]
+```
+
+- (Note the negation to find fields that are missing)
+- I don't know what I want to do with these yet
+
+## 2023-12-05
+
+- I finished the `fix_maxmind_stats.py` script and fixed 1.6 million records and imported them on CGSpace after testing on DSpace 7 Test
+- Altmetric said there was a glitch regarding the Handle and DOI linking and they successfully re-scraped the item page and linked them
+ - They sent me a list of current production IPs and I notice that some of them are in our nginx bot network list:
+
+```console
+$ for network in $(csvcut -c network /tmp/ips.csv | sed 1d | sort -u); do grepcidr $network ~/src/git/rmg-ansible-public/roles/dspace/files/nginx/bot-networks.conf; done
+108.128.0.0/13 'bot';
+46.137.0.0/16 'bot';
+52.208.0.0/13 'bot';
+52.48.0.0/13 'bot';
+54.194.0.0/15 'bot';
+54.216.0.0/14 'bot';
+54.220.0.0/15 'bot';
+54.228.0.0/15 'bot';
+63.32.242.35/32 'bot';
+63.32.0.0/14 'bot';
+99.80.0.0/15 'bot'
+```
+
+- I will remove those for now so that Altmetric doesn't have any unexpected issues harvesting
+
diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html
index 96c7035da..d79b05e89 100644
--- a/docs/2015-11/index.html
+++ b/docs/2015-11/index.html
@@ -34,7 +34,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
"/>
-
+
diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html
index 24e410259..25fa15100 100644
--- a/docs/2015-12/index.html
+++ b/docs/2015-12/index.html
@@ -36,7 +36,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
"/>
-
+
diff --git a/docs/2016-01/index.html b/docs/2016-01/index.html
index 172e78884..5b193f571 100644
--- a/docs/2016-01/index.html
+++ b/docs/2016-01/index.html
@@ -28,7 +28,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
Update GitHub wiki for documentation of maintenance tasks.
"/>
-
+
diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html
index dd494a8e8..77586752e 100644
--- a/docs/2016-02/index.html
+++ b/docs/2016-02/index.html
@@ -38,7 +38,7 @@ I noticed we have a very interesting list of countries on CGSpace:
Not only are there 49,000 countries, we have some blanks (25)…
Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
"/>
-
+
diff --git a/docs/2016-03/index.html b/docs/2016-03/index.html
index 39cde541e..a426bcd2d 100644
--- a/docs/2016-03/index.html
+++ b/docs/2016-03/index.html
@@ -28,7 +28,7 @@ Looking at issues with author authorities on CGSpace
For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
"/>
-
+
diff --git a/docs/2016-04/index.html b/docs/2016-04/index.html
index 5c78c3271..9ac5b829c 100644
--- a/docs/2016-04/index.html
+++ b/docs/2016-04/index.html
@@ -32,7 +32,7 @@ After running DSpace for over five years I’ve never needed to look in any
This will save us a few gigs of backup space we’re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:
"/>
-
+
diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html
index 8f423d25c..297fc8c91 100644
--- a/docs/2016-05/index.html
+++ b/docs/2016-05/index.html
@@ -34,7 +34,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
# awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
"/>
-
+
diff --git a/docs/2016-06/index.html b/docs/2016-06/index.html
index 93a1b1ad7..f14b3fc8c 100644
--- a/docs/2016-06/index.html
+++ b/docs/2016-06/index.html
@@ -34,7 +34,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
"/>
-
+
diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html
index b3d5afda2..b813e2b90 100644
--- a/docs/2016-07/index.html
+++ b/docs/2016-07/index.html
@@ -44,7 +44,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
In this case the select query was showing 95 results before the update
"/>
-
+
diff --git a/docs/2016-08/index.html b/docs/2016-08/index.html
index 9c96ade76..2bbe8274c 100644
--- a/docs/2016-08/index.html
+++ b/docs/2016-08/index.html
@@ -42,7 +42,7 @@ $ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
"/>
-
+
diff --git a/docs/2016-09/index.html b/docs/2016-09/index.html
index d77811116..f6e6cb547 100644
--- a/docs/2016-09/index.html
+++ b/docs/2016-09/index.html
@@ -34,7 +34,7 @@ It looks like we might be able to use OUs now, instead of DCs:
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
"/>
-
+
diff --git a/docs/2016-10/index.html b/docs/2016-10/index.html
index 7317a8e1b..5b247b717 100644
--- a/docs/2016-10/index.html
+++ b/docs/2016-10/index.html
@@ -42,7 +42,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
"/>
-
+
diff --git a/docs/2016-11/index.html b/docs/2016-11/index.html
index 72a81c286..9848e88d9 100644
--- a/docs/2016-11/index.html
+++ b/docs/2016-11/index.html
@@ -26,7 +26,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module
Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
"/>
-
+
diff --git a/docs/2016-12/index.html b/docs/2016-12/index.html
index 95a6e0400..7e3335ebe 100644
--- a/docs/2016-12/index.html
+++ b/docs/2016-12/index.html
@@ -46,7 +46,7 @@ I see thousands of them in the logs for the last few months, so it’s not r
I’ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is:
"/>
-
+
diff --git a/docs/2017-01/index.html b/docs/2017-01/index.html
index e6bfee640..22df62961 100644
--- a/docs/2017-01/index.html
+++ b/docs/2017-01/index.html
@@ -28,7 +28,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s
I tested on DSpace Test as well and it doesn’t work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
"/>
-
+
diff --git a/docs/2017-02/index.html b/docs/2017-02/index.html
index 2824b3cab..5ae0ec69e 100644
--- a/docs/2017-02/index.html
+++ b/docs/2017-02/index.html
@@ -50,7 +50,7 @@ DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
"/>
-
+
diff --git a/docs/2017-03/index.html b/docs/2017-03/index.html
index fdd1f738e..8e9e2112e 100644
--- a/docs/2017-03/index.html
+++ b/docs/2017-03/index.html
@@ -54,7 +54,7 @@ Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing reg
$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
"/>
-
+
diff --git a/docs/2017-04/index.html b/docs/2017-04/index.html
index 7130c820c..5f94ce3a9 100644
--- a/docs/2017-04/index.html
+++ b/docs/2017-04/index.html
@@ -40,7 +40,7 @@ Testing the CMYK patch on a collection with 650 items:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
"/>
-
+
diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html
index 9b44d4416..378cddedb 100644
--- a/docs/2017-05/index.html
+++ b/docs/2017-05/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html
index 44d8a8db4..9468cee7a 100644
--- a/docs/2017-06/index.html
+++ b/docs/2017-06/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html
index 85b63d1ce..c8a4433f2 100644
--- a/docs/2017-07/index.html
+++ b/docs/2017-07/index.html
@@ -36,7 +36,7 @@ Merge changes for WLE Phase II theme rename (#329)
Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
"/>
-
+
diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html
index 2309f7c75..bdf4182cc 100644
--- a/docs/2017-08/index.html
+++ b/docs/2017-08/index.html
@@ -60,7 +60,7 @@ This was due to newline characters in the dc.description.abstract column, which
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
"/>
-
+
diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html
index cb766ec54..7f74849bb 100644
--- a/docs/2017-09/index.html
+++ b/docs/2017-09/index.html
@@ -32,7 +32,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
"/>
-
+
diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html
index 6f957a1ed..97ed7085a 100644
--- a/docs/2017-10/index.html
+++ b/docs/2017-10/index.html
@@ -34,7 +34,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
"/>
-
+
diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html
index 2a0f1e3e6..bf72d7e64 100644
--- a/docs/2017-11/index.html
+++ b/docs/2017-11/index.html
@@ -48,7 +48,7 @@ Generate list of authors on CGSpace for Peter to go through and correct:
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
"/>
-
+
diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html
index 5319388d0..9d4b98ab4 100644
--- a/docs/2017-12/index.html
+++ b/docs/2017-12/index.html
@@ -30,7 +30,7 @@ The logs say “Timeout waiting for idle object”
PostgreSQL activity says there are 115 connections currently
The list of connections to XMLUI and REST API for today:
"/>
-
+
diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html
index 88377078b..4b7d00856 100644
--- a/docs/2018-01/index.html
+++ b/docs/2018-01/index.html
@@ -150,7 +150,7 @@ dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
"/>
-
+
diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html
index e4e8c3e7c..92ce6aa41 100644
--- a/docs/2018-02/index.html
+++ b/docs/2018-02/index.html
@@ -30,7 +30,7 @@ We don’t need to distinguish between internal and external works, so that
Yesterday I figured out how to monitor DSpace sessions using JMX
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
"/>
-
+
diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html
index ce65a2d4f..b067cb396 100644
--- a/docs/2018-03/index.html
+++ b/docs/2018-03/index.html
@@ -24,7 +24,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
Export a CSV of the IITA community metadata for Martin Mueller
"/>
-
+
diff --git a/docs/2018-04/index.html b/docs/2018-04/index.html
index 96ac6827f..2f986ad2c 100644
--- a/docs/2018-04/index.html
+++ b/docs/2018-04/index.html
@@ -26,7 +26,7 @@ Catalina logs at least show some memory errors yesterday:
I tried to test something on DSpace Test but noticed that it’s down since god knows when
Catalina logs at least show some memory errors yesterday:
"/>
-
+
diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html
index bcf16970f..0f10fa51a 100644
--- a/docs/2018-05/index.html
+++ b/docs/2018-05/index.html
@@ -38,7 +38,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
Then I reduced the JVM heap size from 6144 back to 5120m
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
"/>
-
+
diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html
index c510cd5d3..a8d8f687c 100644
--- a/docs/2018-06/index.html
+++ b/docs/2018-06/index.html
@@ -58,7 +58,7 @@ real 74m42.646s
user 8m5.056s
sys 2m7.289s
"/>
-
+
diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html
index d2f650cf6..2d31c694e 100644
--- a/docs/2018-07/index.html
+++ b/docs/2018-07/index.html
@@ -36,7 +36,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r
There is insufficient memory for the Java Runtime Environment to continue.
"/>
-
+
diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html
index a5c13aae2..9b53027af 100644
--- a/docs/2018-08/index.html
+++ b/docs/2018-08/index.html
@@ -46,7 +46,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it
"/>
-
+
diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html
index 8ac8419d0..c26bf927f 100644
--- a/docs/2018-09/index.html
+++ b/docs/2018-09/index.html
@@ -30,7 +30,7 @@ I’ll update the DSpace role in our Ansible infrastructure playbooks and ru
Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
"/>
-
+
diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html
index adc18dae0..3736e50b8 100644
--- a/docs/2018-10/index.html
+++ b/docs/2018-10/index.html
@@ -26,7 +26,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
"/>
-
+
diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html
index 8b1e0ada4..5a13dd47b 100644
--- a/docs/2018-11/index.html
+++ b/docs/2018-11/index.html
@@ -36,7 +36,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
"/>
-
+
diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html
index 424eeb5cd..54e309207 100644
--- a/docs/2018-12/index.html
+++ b/docs/2018-12/index.html
@@ -36,7 +36,7 @@ Then I ran all system updates and restarted the server
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
"/>
-
+
diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html
index 10f3d728b..5aff809ec 100644
--- a/docs/2019-01/index.html
+++ b/docs/2019-01/index.html
@@ -50,7 +50,7 @@ I don’t see anything interesting in the web server logs around that time t
357 207.46.13.1
903 54.70.40.11
"/>
-
+
diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html
index 33ba9932c..72929c2e8 100644
--- a/docs/2019-02/index.html
+++ b/docs/2019-02/index.html
@@ -72,7 +72,7 @@ real 0m19.873s
user 0m22.203s
sys 0m1.979s
"/>
-
+
diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html
index c78918f93..e0f75d3ca 100644
--- a/docs/2019-03/index.html
+++ b/docs/2019-03/index.html
@@ -46,7 +46,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
"/>
-
+
diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html
index a26b4ec6c..7beecbc2c 100644
--- a/docs/2019-04/index.html
+++ b/docs/2019-04/index.html
@@ -64,7 +64,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
"/>
-
+
diff --git a/docs/2019-05/index.html b/docs/2019-05/index.html
index 0d6dd4895..6dce191b8 100644
--- a/docs/2019-05/index.html
+++ b/docs/2019-05/index.html
@@ -48,7 +48,7 @@ DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present…
"/>
-
+
diff --git a/docs/2019-06/index.html b/docs/2019-06/index.html
index 2e2267f5b..22c3764f7 100644
--- a/docs/2019-06/index.html
+++ b/docs/2019-06/index.html
@@ -34,7 +34,7 @@ Run system updates on CGSpace (linode18) and reboot it
Skype with Marie-Angélique and Abenet about CG Core v2
"/>
-
+
diff --git a/docs/2019-07/index.html b/docs/2019-07/index.html
index 983ee68b0..2e34aa2ac 100644
--- a/docs/2019-07/index.html
+++ b/docs/2019-07/index.html
@@ -38,7 +38,7 @@ CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
"/>
-
+
diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html
index 3832df8b0..bcbad1e97 100644
--- a/docs/2019-08/index.html
+++ b/docs/2019-08/index.html
@@ -46,7 +46,7 @@ After rebooting, all statistics cores were loaded… wow, that’s luck
Run system updates on DSpace Test (linode19) and reboot it
"/>
-
+
diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html
index 2b4e76394..e331641b6 100644
--- a/docs/2019-09/index.html
+++ b/docs/2019-09/index.html
@@ -72,7 +72,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
"/>
-
+
diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html
index 4eadc4c2a..7afcc13d3 100644
--- a/docs/2019-10/index.html
+++ b/docs/2019-10/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html
index e0d50ca50..ee6f59595 100644
--- a/docs/2019-11/index.html
+++ b/docs/2019-11/index.html
@@ -58,7 +58,7 @@ Let’s see how many of the REST API requests were for bitstreams (because t
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
106781
"/>
-
+
diff --git a/docs/2019-12/index.html b/docs/2019-12/index.html
index f7028f874..da4a4d461 100644
--- a/docs/2019-12/index.html
+++ b/docs/2019-12/index.html
@@ -46,7 +46,7 @@ Make sure all packages are up to date and the package manager is up to date, the
# dpkg -C
# reboot
"/>
-
+
diff --git a/docs/2020-01/index.html b/docs/2020-01/index.html
index 26f37f7e0..23dfac3aa 100644
--- a/docs/2020-01/index.html
+++ b/docs/2020-01/index.html
@@ -56,7 +56,7 @@ I tweeted the CGSpace repository link
"/>
-
+
diff --git a/docs/2020-02/index.html b/docs/2020-02/index.html
index 207e7bf23..f7de27a91 100644
--- a/docs/2020-02/index.html
+++ b/docs/2020-02/index.html
@@ -38,7 +38,7 @@ The code finally builds and runs with a fresh install
"/>
-
+
diff --git a/docs/2020-03/index.html b/docs/2020-03/index.html
index 3c7a7c211..9f18af22b 100644
--- a/docs/2020-03/index.html
+++ b/docs/2020-03/index.html
@@ -42,7 +42,7 @@ You need to download this into the DSpace 6.x source and compile it
"/>
-
+
diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html
index c7c306a87..6391c97cf 100644
--- a/docs/2020-04/index.html
+++ b/docs/2020-04/index.html
@@ -48,7 +48,7 @@ The third item now has a donut with score 1 since I tweeted it last week
On the same note, the one item Abenet pointed out last week now has a donut with score of 104 after I tweeted it last week
"/>
-
+
diff --git a/docs/2020-05/index.html b/docs/2020-05/index.html
index 1ba6c494a..6167beb5c 100644
--- a/docs/2020-05/index.html
+++ b/docs/2020-05/index.html
@@ -34,7 +34,7 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2
"/>
-
+
diff --git a/docs/2020-06/index.html b/docs/2020-06/index.html
index 0d5803ddc..7a883b6a2 100644
--- a/docs/2020-06/index.html
+++ b/docs/2020-06/index.html
@@ -36,7 +36,7 @@ I sent Atmire the dspace.log from today and told them to log into the server to
In other news, I checked the statistics API on DSpace 6 and it’s working
I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
"/>
-
+
diff --git a/docs/2020-07/index.html b/docs/2020-07/index.html
index 83c589736..85c773f6e 100644
--- a/docs/2020-07/index.html
+++ b/docs/2020-07/index.html
@@ -38,7 +38,7 @@ I restarted Tomcat and PostgreSQL and the issue was gone
Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the 5_x-prod branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request
"/>
-
+
diff --git a/docs/2020-08/index.html b/docs/2020-08/index.html
index 3545cc892..e587951e1 100644
--- a/docs/2020-08/index.html
+++ b/docs/2020-08/index.html
@@ -36,7 +36,7 @@ It is class based so I can easily add support for other vocabularies, and the te
"/>
-
+
diff --git a/docs/2020-09/index.html b/docs/2020-09/index.html
index 987d91b8d..69b7e4771 100644
--- a/docs/2020-09/index.html
+++ b/docs/2020-09/index.html
@@ -48,7 +48,7 @@ I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39
I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40
"/>
-
+
diff --git a/docs/2020-10/index.html b/docs/2020-10/index.html
index 5c293298e..825db89b0 100644
--- a/docs/2020-10/index.html
+++ b/docs/2020-10/index.html
@@ -44,7 +44,7 @@ During the FlywayDB migration I got an error:
"/>
-
+
diff --git a/docs/2020-11/index.html b/docs/2020-11/index.html
index 4e2394455..eafe9a92f 100644
--- a/docs/2020-11/index.html
+++ b/docs/2020-11/index.html
@@ -32,7 +32,7 @@ So far we’ve spent at least fifty hours to process the statistics and stat
"/>
-
+
diff --git a/docs/2020-12/index.html b/docs/2020-12/index.html
index c5812f389..4208c6317 100644
--- a/docs/2020-12/index.html
+++ b/docs/2020-12/index.html
@@ -36,7 +36,7 @@ I started processing those (about 411,000 records):
"/>
-
+
diff --git a/docs/2021-01/index.html b/docs/2021-01/index.html
index 099da728c..1fcba022b 100644
--- a/docs/2021-01/index.html
+++ b/docs/2021-01/index.html
@@ -50,7 +50,7 @@ For example, this item has 51 views on CGSpace, but 0 on AReS
"/>
-
+
diff --git a/docs/2021-02/index.html b/docs/2021-02/index.html
index 37f27d9ef..367643792 100644
--- a/docs/2021-02/index.html
+++ b/docs/2021-02/index.html
@@ -60,7 +60,7 @@ $ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty
}
}
"/>
-
+
diff --git a/docs/2021-03/index.html b/docs/2021-03/index.html
index 9297feb41..257ffde19 100644
--- a/docs/2021-03/index.html
+++ b/docs/2021-03/index.html
@@ -34,7 +34,7 @@ Also, we found some issues building and running OpenRXV currently due to ecosyst
"/>
-
+
diff --git a/docs/2021-04/index.html b/docs/2021-04/index.html
index 68c2aa191..e685fad26 100644
--- a/docs/2021-04/index.html
+++ b/docs/2021-04/index.html
@@ -44,7 +44,7 @@ Perhaps one of the containers crashed, I should have looked closer but I was in
"/>
-
+
diff --git a/docs/2021-05/index.html b/docs/2021-05/index.html
index ed9a8ab04..3f0487b1a 100644
--- a/docs/2021-05/index.html
+++ b/docs/2021-05/index.html
@@ -36,7 +36,7 @@ I looked at the top user agents and IPs in the Solr statistics for last month an
I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
"/>
-
+
diff --git a/docs/2021-06/index.html b/docs/2021-06/index.html
index 58f1484ea..a7487df46 100644
--- a/docs/2021-06/index.html
+++ b/docs/2021-06/index.html
@@ -36,7 +36,7 @@ I simply started it and AReS was running again:
"/>
-
+
diff --git a/docs/2021-07/index.html b/docs/2021-07/index.html
index f16924b92..c9450c313 100644
--- a/docs/2021-07/index.html
+++ b/docs/2021-07/index.html
@@ -30,7 +30,7 @@ Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVO
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
"/>
-
+
diff --git a/docs/2021-08/index.html b/docs/2021-08/index.html
index 2b061589e..1eba76028 100644
--- a/docs/2021-08/index.html
+++ b/docs/2021-08/index.html
@@ -32,7 +32,7 @@ Update Docker images on AReS server (linode20) and reboot the server:
I decided to upgrade linode20 from Ubuntu 18.04 to 20.04
"/>
-
+
diff --git a/docs/2021-09/index.html b/docs/2021-09/index.html
index fc38aa8d1..145d18d16 100644
--- a/docs/2021-09/index.html
+++ b/docs/2021-09/index.html
@@ -48,7 +48,7 @@ The syntax Moayad showed me last month doesn’t seem to honor the search qu
"/>
-
+
diff --git a/docs/2021-10/index.html b/docs/2021-10/index.html
index 990a4fd70..bd657ec44 100644
--- a/docs/2021-10/index.html
+++ b/docs/2021-10/index.html
@@ -46,7 +46,7 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
So we have 1879/7100 (26.46%) matching already
"/>
-
+
diff --git a/docs/2021-11/index.html b/docs/2021-11/index.html
index 4a4e72a65..311e8c3a6 100644
--- a/docs/2021-11/index.html
+++ b/docs/2021-11/index.html
@@ -32,7 +32,7 @@ First I exported all the 2019 stats from CGSpace:
$ ./run.sh -s http://localhost:8081/solr/statistics -f 'time:2019-*' -a export -o statistics-2019.json -k uid
$ zstd statistics-2019.json
"/>
-
+
diff --git a/docs/2021-12/index.html b/docs/2021-12/index.html
index 8d7feecdf..509061023 100644
--- a/docs/2021-12/index.html
+++ b/docs/2021-12/index.html
@@ -40,7 +40,7 @@ Purging 455 hits from WhatsApp in statistics
Total number of bot hits purged: 3679
"/>
-
+
diff --git a/docs/2022-01/index.html b/docs/2022-01/index.html
index 9e64f0146..fa529fdff 100644
--- a/docs/2022-01/index.html
+++ b/docs/2022-01/index.html
@@ -24,7 +24,7 @@ Start a full harvest on AReS
Start a full harvest on AReS
"/>
-
+
diff --git a/docs/2022-02/index.html b/docs/2022-02/index.html
index 8c9a242c9..2fe3f9a5e 100644
--- a/docs/2022-02/index.html
+++ b/docs/2022-02/index.html
@@ -38,7 +38,7 @@ We agreed to try to do more alignment of affiliations/funders with ROR
"/>
-
+
diff --git a/docs/2022-03/index.html b/docs/2022-03/index.html
index 561c5c2cb..6ddd23d7f 100644
--- a/docs/2022-03/index.html
+++ b/docs/2022-03/index.html
@@ -34,7 +34,7 @@ $ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p 'fuuu&
$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
"/>
-
+
diff --git a/docs/2022-04/index.html b/docs/2022-04/index.html
index 9d54bb4d9..d27894387 100644
--- a/docs/2022-04/index.html
+++ b/docs/2022-04/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/2022-05/index.html b/docs/2022-05/index.html
index d45467ebc..a9f2d4290 100644
--- a/docs/2022-05/index.html
+++ b/docs/2022-05/index.html
@@ -66,7 +66,7 @@ If I query Solr for time:2022-04* AND dns:*msnbot* AND dns:*.msn.com. I see a ha
I purged 93,974 hits from these IPs using my check-spider-ip-hits.sh script
"/>
-
+
diff --git a/docs/2022-06/index.html b/docs/2022-06/index.html
index 07ce813f1..e64cd12c9 100644
--- a/docs/2022-06/index.html
+++ b/docs/2022-06/index.html
@@ -48,7 +48,7 @@ There seem to be many more of these:
"/>
-
+
diff --git a/docs/2022-07/index.html b/docs/2022-07/index.html
index ba8f9591f..647fc8c61 100644
--- a/docs/2022-07/index.html
+++ b/docs/2022-07/index.html
@@ -34,7 +34,7 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
"/>
-
+
diff --git a/docs/2022-08/index.html b/docs/2022-08/index.html
index 2337c4d01..945355fb4 100644
--- a/docs/2022-08/index.html
+++ b/docs/2022-08/index.html
@@ -24,7 +24,7 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
"/>
-
+
diff --git a/docs/2022-09/index.html b/docs/2022-09/index.html
index 6b7400750..130d1c536 100644
--- a/docs/2022-09/index.html
+++ b/docs/2022-09/index.html
@@ -46,7 +46,7 @@ I also fixed a few bugs and improved the region-matching logic
"/>
-
+
diff --git a/docs/2022-10/index.html b/docs/2022-10/index.html
index 240ed5748..56c7847e8 100644
--- a/docs/2022-10/index.html
+++ b/docs/2022-10/index.html
@@ -36,7 +36,7 @@ I filed an issue to ask about Java 11+ support
"/>
-
+
diff --git a/docs/2022-11/index.html b/docs/2022-11/index.html
index d9cfcc0f8..31c871daf 100644
--- a/docs/2022-11/index.html
+++ b/docs/2022-11/index.html
@@ -44,7 +44,7 @@ I want to make sure they use groups instead of individuals where possible!
I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue
"/>
-
+
diff --git a/docs/2022-12/index.html b/docs/2022-12/index.html
index 173ef8057..1e354a170 100644
--- a/docs/2022-12/index.html
+++ b/docs/2022-12/index.html
@@ -36,7 +36,7 @@ I exported the CCAFS and IITA communities, extracted just the country and region
Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!
Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)
"/>
-
+
diff --git a/docs/2023-01/index.html b/docs/2023-01/index.html
index 613d9369d..534f80f0d 100644
--- a/docs/2023-01/index.html
+++ b/docs/2023-01/index.html
@@ -34,7 +34,7 @@ I see we have some new ones that aren’t in our list if I combine with this
"/>
-
+
diff --git a/docs/2023-02/index.html b/docs/2023-02/index.html
index bcb41240d..fe08d3b00 100644
--- a/docs/2023-02/index.html
+++ b/docs/2023-02/index.html
@@ -32,7 +32,7 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
"/>
-
+
diff --git a/docs/2023-03/index.html b/docs/2023-03/index.html
index 373b0a261..aa5cc5777 100644
--- a/docs/2023-03/index.html
+++ b/docs/2023-03/index.html
@@ -28,7 +28,7 @@ Remove cg.subject.wle and cg.identifier.wletheme from CGSpace input form after c
iso-codes 4.13.0 was released, which incorporates my changes to the common names for Iran, Laos, and Syria
I finally got through with porting the input form from DSpace 6 to DSpace 7
"/>
-
+
diff --git a/docs/2023-04/index.html b/docs/2023-04/index.html
index 7cb603d87..e9c368aac 100644
--- a/docs/2023-04/index.html
+++ b/docs/2023-04/index.html
@@ -36,7 +36,7 @@ I also did a check for missing country/region mappings with csv-metadata-quality
Start a harvest on AReS
"/>
-
+
diff --git a/docs/2023-05/index.html b/docs/2023-05/index.html
index f05c69409..f4159c718 100644
--- a/docs/2023-05/index.html
+++ b/docs/2023-05/index.html
@@ -46,7 +46,7 @@ Also I found at least two spelling mistakes, for example “decison support
Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace
"/>
-
+
diff --git a/docs/2023-06/index.html b/docs/2023-06/index.html
index cf8edab04..10d754f75 100644
--- a/docs/2023-06/index.html
+++ b/docs/2023-06/index.html
@@ -44,7 +44,7 @@ From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then
"/>
-
+
diff --git a/docs/2023-07/index.html b/docs/2023-07/index.html
index 99a83fde0..e61c32725 100644
--- a/docs/2023-07/index.html
+++ b/docs/2023-07/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/2023-08/index.html b/docs/2023-08/index.html
index 790e50a03..780f3fe58 100644
--- a/docs/2023-08/index.html
+++ b/docs/2023-08/index.html
@@ -34,7 +34,7 @@ I did some minor cleanups myself and applied them to CGSpace
Start working on some batch uploads for IFPRI
"/>
-
+
diff --git a/docs/2023-09/index.html b/docs/2023-09/index.html
index 7bc348ca1..0b098e464 100644
--- a/docs/2023-09/index.html
+++ b/docs/2023-09/index.html
@@ -26,7 +26,7 @@ Start a harvest on AReS
Export CGSpace to check for missing Initiative collection mappings
Start a harvest on AReS
"/>
-
+
diff --git a/docs/2023-10/index.html b/docs/2023-10/index.html
index 024dae24b..5c5bd5195 100644
--- a/docs/2023-10/index.html
+++ b/docs/2023-10/index.html
@@ -36,7 +36,7 @@ We can be on the safe side by using only abstracts for items that are licensed u
"/>
-
+
diff --git a/docs/2023-11/index.html b/docs/2023-11/index.html
index 29aac8ce7..21dc01535 100644
--- a/docs/2023-11/index.html
+++ b/docs/2023-11/index.html
@@ -23,7 +23,7 @@ Start a harvest on AReS
-
+
@@ -42,7 +42,7 @@ I improved the filtering and wrote some Python using pandas to merge my sources
Export CGSpace to check missing Initiative collection mappings
Start a harvest on AReS
"/>
-
+
@@ -54,7 +54,7 @@ Start a harvest on AReS
"url": "https://alanorth.github.io/cgspace-notes/2023-11/",
"wordCount": "1318",
"datePublished": "2023-11-02T12:59:36+03:00",
- "dateModified": "2023-11-23T16:15:13+03:00",
+ "dateModified": "2023-12-02T10:38:09+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
diff --git a/docs/2023-12/index.html b/docs/2023-12/index.html
index 8c464c6fe..96f80cd68 100644
--- a/docs/2023-12/index.html
+++ b/docs/2023-12/index.html
@@ -11,14 +11,14 @@
-
+
-
+
@@ -28,9 +28,9 @@
"@type": "BlogPosting",
"headline": "December, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-12/",
- "wordCount": "286",
+ "wordCount": "529",
"datePublished": "2023-12-01T08:48:36+03:00",
- "dateModified": "2023-12-01T08:48:36+03:00",
+ "dateModified": "2023-12-02T10:38:09+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -174,6 +174,44 @@
Export CGSpace to check for missing Initiative collection mappings
Start a harvest on AReS
+
2023-12-04
+
+
Send a message to Altmetric support because the item IWMI highlighted last month still doesn’t show the attention score for the Handle after I tweeted it several times weeks ago
+
Spent some time writing a Python script to fix the literal MaxMind City JSON objects in our Solr statistics
+
+
There are about 1.6 million of these, so I exported them using solr-import-export-json with the query city:com* but ended up finding many that have missing bundles, container bitstreams, etc:
+
+
+
+
city:com* AND -bundleName:[* TO *] AND -containerBitstream:[* TO *] AND -file_id:[* TO *] AND -owningItem:[* TO *] AND -version_id:[* TO *]
+
+
(Note the negation to find fields that are missing)
+
I don’t know what I want to do with these yet
+
+
2023-12-05
+
+
I finished the fix_maxmind_stats.py script and fixed 1.6 million records and imported them on CGSpace after testing on DSpace 7 Test
+
Altmetric said there was a glitch regarding the Handle and DOI linking and they successfully re-scraped the item page and linked them
+
+
They sent me a list of current production IPs and I notice that some of them are in our nginx bot network list:
I will remove those for now so that Altmetric doesn’t have any unexpected issues harvesting
+
diff --git a/docs/404.html b/docs/404.html
index 62f48155b..6c9180274 100644
--- a/docs/404.html
+++ b/docs/404.html
@@ -17,7 +17,7 @@
-
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 74ac7f9fc..4be52d64b 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 506c75400..1b9cbaa84 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/index.xml b/docs/categories/notes/index.xml
index 2e46e498e..db244e844 100644
--- a/docs/categories/notes/index.xml
+++ b/docs/categories/notes/index.xml
@@ -20,61 +20,28 @@
https://alanorth.github.io/cgspace-notes/2023-11/
Thu, 02 Nov 2023 12:59:36 +0300https://alanorth.github.io/cgspace-notes/2023-11/
- <h2 id="2023-11-01">2023-11-01</h2>
-<ul>
-<li>Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
-<ul>
-<li>I improved the filtering and wrote some Python using pandas to merge my sources more reliably</li>
-</ul>
-</li>
-</ul>
-<h2 id="2023-11-02">2023-11-02</h2>
-<ul>
-<li>Export CGSpace to check missing Initiative collection mappings</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-11-01">2023-11-01</h2>
<ul>
<li>Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
<ul>
<li>I improved the filtering and wrote some Python using pandas to merge my sources more reliably</li>
</ul>
</li>
</ul>
<h2 id="2023-11-02">2023-11-02</h2>
<ul>
<li>Export CGSpace to check missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>October, 2023
https://alanorth.github.io/cgspace-notes/2023-10/
Mon, 02 Oct 2023 09:05:36 +0300https://alanorth.github.io/cgspace-notes/2023-10/
- <h2 id="2023-10-02">2023-10-02</h2>
-<ul>
-<li>Export CGSpace to check DOIs against Crossref
-<ul>
-<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref’s metadata is in the public domain under the CC0 license</a></li>
-<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
-<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-10-02">2023-10-02</h2>
<ul>
<li>Export CGSpace to check DOIs against Crossref
<ul>
<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref’s metadata is in the public domain under the CC0 license</a></li>
<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
</ul>
</li>
</ul>September, 2023
https://alanorth.github.io/cgspace-notes/2023-09/
Sat, 02 Sep 2023 17:29:36 +0300https://alanorth.github.io/cgspace-notes/2023-09/
- <h2 id="2023-09-02">2023-09-02</h2>
-<ul>
-<li>Export CGSpace to check for missing Initiative collection mappings</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-09-02">2023-09-02</h2>
<ul>
<li>Export CGSpace to check for missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>August, 2023
https://alanorth.github.io/cgspace-notes/2023-08/
Thu, 03 Aug 2023 11:18:36 +0300https://alanorth.github.io/cgspace-notes/2023-08/
- <h2 id="2023-08-03">2023-08-03</h2>
-<ul>
-<li>I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
-<ul>
-<li>I did some minor cleanups myself and applied them to CGSpace</li>
-</ul>
-</li>
-<li>Start working on some batch uploads for IFPRI</li>
-</ul>
+ <h2 id="2023-08-03">2023-08-03</h2>
<ul>
<li>I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
<ul>
<li>I did some minor cleanups myself and applied them to CGSpace</li>
</ul>
</li>
<li>Start working on some batch uploads for IFPRI</li>
</ul>July, 2023
@@ -88,249 +55,98 @@
https://alanorth.github.io/cgspace-notes/2023-06/
Fri, 02 Jun 2023 10:29:36 +0300https://alanorth.github.io/cgspace-notes/2023-06/
- <h2 id="2023-06-02">2023-06-02</h2>
-<ul>
-<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
-<ul>
-<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
-</ul>
-</li>
-<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
-<ul>
-<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
-<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-06-02">2023-06-02</h2>
<ul>
<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
<ul>
<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
</ul>
</li>
<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
<ul>
<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
</ul>
</li>
</ul>May, 2023
https://alanorth.github.io/cgspace-notes/2023-05/
Wed, 03 May 2023 08:53:36 +0300https://alanorth.github.io/cgspace-notes/2023-05/
- <h2 id="2023-05-03">2023-05-03</h2>
-<ul>
-<li>Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
-<ul>
-<li>It seems their password expired, which is annoying</li>
-</ul>
-</li>
-<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
-<ul>
-<li>There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…</li>
-<li>Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly</li>
-</ul>
-</li>
-<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
-</ul>
+ <h2 id="2023-05-03">2023-05-03</h2>
<ul>
<li>Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
<ul>
<li>It seems their password expired, which is annoying</li>
</ul>
</li>
<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
<ul>
<li>There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…</li>
<li>Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly</li>
</ul>
</li>
<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
</ul>April, 2023
https://alanorth.github.io/cgspace-notes/2023-04/
Sun, 02 Apr 2023 08:19:36 +0300https://alanorth.github.io/cgspace-notes/2023-04/
- <h2 id="2023-04-02">2023-04-02</h2>
-<ul>
-<li>Run all system updates on CGSpace and reboot it</li>
-<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
-<ul>
-<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
-</ul>
-</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-04-02">2023-04-02</h2>
<ul>
<li>Run all system updates on CGSpace and reboot it</li>
<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
<ul>
<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>March, 2023
https://alanorth.github.io/cgspace-notes/2023-03/
Wed, 01 Mar 2023 07:58:36 +0300https://alanorth.github.io/cgspace-notes/2023-03/
- <h2 id="2023-03-01">2023-03-01</h2>
-<ul>
-<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
-<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
-<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
-</ul>
+ <h2 id="2023-03-01">2023-03-01</h2>
<ul>
<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
</ul>February, 2023
https://alanorth.github.io/cgspace-notes/2023-02/
Wed, 01 Feb 2023 10:57:36 +0300https://alanorth.github.io/cgspace-notes/2023-02/
- <h2 id="2023-02-01">2023-02-01</h2>
-<ul>
-<li>Export CGSpace to cross check the DOI metadata with Crossref
-<ul>
-<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-02-01">2023-02-01</h2>
<ul>
<li>Export CGSpace to cross check the DOI metadata with Crossref
<ul>
<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…</li>
</ul>
</li>
</ul>January, 2023
https://alanorth.github.io/cgspace-notes/2023-01/
Sun, 01 Jan 2023 08:44:36 +0300https://alanorth.github.io/cgspace-notes/2023-01/
- <h2 id="2023-01-01">2023-01-01</h2>
-<ul>
-<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
-<ul>
-<li>I want to update all ORCID names and refresh them in the database</li>
-<li>I see we have some new ones that aren’t in our list if I combine with this file:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-01-01">2023-01-01</h2>
<ul>
<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
<ul>
<li>I want to update all ORCID names and refresh them in the database</li>
<li>I see we have some new ones that aren’t in our list if I combine with this file:</li>
</ul>
</li>
</ul>December, 2022
https://alanorth.github.io/cgspace-notes/2022-12/
Thu, 01 Dec 2022 08:52:36 +0300https://alanorth.github.io/cgspace-notes/2022-12/
- <h2 id="2022-12-01">2022-12-01</h2>
-<ul>
-<li>Fix some incorrect regions on CGSpace
-<ul>
-<li>I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions</li>
-</ul>
-</li>
-<li>Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!</li>
-<li>Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)</li>
-</ul>
+ <h2 id="2022-12-01">2022-12-01</h2>
<ul>
<li>Fix some incorrect regions on CGSpace
<ul>
<li>I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions</li>
</ul>
</li>
<li>Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!</li>
<li>Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)</li>
</ul>November, 2022
https://alanorth.github.io/cgspace-notes/2022-11/
Tue, 01 Nov 2022 09:11:36 +0300https://alanorth.github.io/cgspace-notes/2022-11/
- <h2 id="2022-11-01">2022-11-01</h2>
-<ul>
-<li>Last night I re-synced DSpace 7 Test from CGSpace
-<ul>
-<li>I also updated all my local <code>7_x-dev</code> branches on the latest upstreams</li>
-</ul>
-</li>
-<li>I spent some time updating the authorizations in Alliance collections
-<ul>
-<li>I want to make sure they use groups instead of individuals where possible!</li>
-</ul>
-</li>
-<li>I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue</li>
-</ul>
+ <h2 id="2022-11-01">2022-11-01</h2>
<ul>
<li>Last night I re-synced DSpace 7 Test from CGSpace
<ul>
<li>I also updated all my local <code>7_x-dev</code> branches on the latest upstreams</li>
</ul>
</li>
<li>I spent some time updating the authorizations in Alliance collections
<ul>
<li>I want to make sure they use groups instead of individuals where possible!</li>
</ul>
</li>
<li>I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue</li>
</ul>October, 2022
https://alanorth.github.io/cgspace-notes/2022-10/
Sat, 01 Oct 2022 19:45:36 +0300https://alanorth.github.io/cgspace-notes/2022-10/
- <h2 id="2022-10-01">2022-10-01</h2>
-<ul>
-<li>Start a harvest on AReS last night</li>
-<li>Yesterday I realized how to use <a href="https://im4java.sourceforge.net/docs/dev-guide.html">GraphicsMagick with im4java</a> and I want to re-visit some of my thumbnail tests
-<ul>
-<li>I’m also interested in libvips support via jVips, though last time I checked it was only for Java 8</li>
-<li>I filed <a href="https://github.com/criteo/JVips/issues/141">an issue to ask about Java 11+ support</a></li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-10-01">2022-10-01</h2>
<ul>
<li>Start a harvest on AReS last night</li>
<li>Yesterday I realized how to use <a href="https://im4java.sourceforge.net/docs/dev-guide.html">GraphicsMagick with im4java</a> and I want to re-visit some of my thumbnail tests
<ul>
<li>I’m also interested in libvips support via jVips, though last time I checked it was only for Java 8</li>
<li>I filed <a href="https://github.com/criteo/JVips/issues/141">an issue to ask about Java 11+ support</a></li>
</ul>
</li>
</ul>September, 2022
https://alanorth.github.io/cgspace-notes/2022-09/
Thu, 01 Sep 2022 09:41:36 +0300https://alanorth.github.io/cgspace-notes/2022-09/
- <h2 id="2022-09-01">2022-09-01</h2>
-<ul>
-<li>A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet</li>
-<li>I tested an item submission on DSpace Test with the Cocoon <code>org.apache.cocoon.uploads.autosave=false</code> change
-<ul>
-<li>The submission works as expected</li>
-</ul>
-</li>
-<li>Start debugging some region-related issues with csv-metadata-quality
-<ul>
-<li>I created a new test file <code>test-geography.csv</code> with some different scenarios</li>
-<li>I also fixed a few bugs and improved the region-matching logic</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-09-01">2022-09-01</h2>
<ul>
<li>A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet</li>
<li>I tested an item submission on DSpace Test with the Cocoon <code>org.apache.cocoon.uploads.autosave=false</code> change
<ul>
<li>The submission works as expected</li>
</ul>
</li>
<li>Start debugging some region-related issues with csv-metadata-quality
<ul>
<li>I created a new test file <code>test-geography.csv</code> with some different scenarios</li>
<li>I also fixed a few bugs and improved the region-matching logic</li>
</ul>
</li>
</ul>August, 2022
https://alanorth.github.io/cgspace-notes/2022-08/
Mon, 01 Aug 2022 10:22:36 +0300https://alanorth.github.io/cgspace-notes/2022-08/
- <h2 id="2022-08-01">2022-08-01</h2>
-<ul>
-<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
-</ul>
+ <h2 id="2022-08-01">2022-08-01</h2>
<ul>
<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
</ul>July, 2022
https://alanorth.github.io/cgspace-notes/2022-07/
Sat, 02 Jul 2022 14:07:36 +0300https://alanorth.github.io/cgspace-notes/2022-07/
- <h2 id="2022-07-02">2022-07-02</h2>
-<ul>
-<li>I learned how to use the Levenshtein functions in PostgreSQL
-<ul>
-<li>The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing</li>
-<li>Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-07-02">2022-07-02</h2>
<ul>
<li>I learned how to use the Levenshtein functions in PostgreSQL
<ul>
<li>The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing</li>
<li>Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first</li>
</ul>
</li>
</ul>June, 2022
https://alanorth.github.io/cgspace-notes/2022-06/
Mon, 06 Jun 2022 09:01:36 +0300https://alanorth.github.io/cgspace-notes/2022-06/
- <h2 id="2022-06-06">2022-06-06</h2>
-<ul>
-<li>Look at the Solr statistics on CGSpace
-<ul>
-<li>I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query <code>dns:*msnbot* AND dns:*.msn.com</code></li>
-<li>I purged these first so I could see the other “real” IPs in the Solr facets</li>
-</ul>
-</li>
-<li>I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent</li>
-<li>I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent</li>
-<li>I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
-<ul>
-<li>There seem to be many more of these:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-06-06">2022-06-06</h2>
<ul>
<li>Look at the Solr statistics on CGSpace
<ul>
<li>I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query <code>dns:*msnbot* AND dns:*.msn.com</code></li>
<li>I purged these first so I could see the other “real” IPs in the Solr facets</li>
</ul>
</li>
<li>I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent</li>
<li>I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent</li>
<li>I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
<ul>
<li>There seem to be many more of these:</li>
</ul>
</li>
</ul>May, 2022
https://alanorth.github.io/cgspace-notes/2022-05/
Wed, 04 May 2022 09:13:39 +0300https://alanorth.github.io/cgspace-notes/2022-05/
- <h2 id="2022-05-04">2022-05-04</h2>
-<ul>
-<li>I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
-<ul>
-<li>18.207.136.176</li>
-<li>185.189.36.248</li>
-<li>50.118.223.78</li>
-<li>52.70.76.123</li>
-<li>3.236.10.11</li>
-</ul>
-</li>
-<li>Looking at the Solr statistics for 2022-04
-<ul>
-<li>52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests</li>
-<li>64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc</li>
-<li>185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt</li>
-<li>157.55.39.159 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
-<li>52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
-<li>157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
-<li>207.46.13.177 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
-<li>If I query Solr for <code>time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.</code> I see a handful of IPs that made 41,000 requests</li>
-</ul>
-</li>
-<li>I purged 93,974 hits from these IPs using my <code>check-spider-ip-hits.sh</code> script</li>
-</ul>
+ <h2 id="2022-05-04">2022-05-04</h2>
<ul>
<li>I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
<ul>
<li>18.207.136.176</li>
<li>185.189.36.248</li>
<li>50.118.223.78</li>
<li>52.70.76.123</li>
<li>3.236.10.11</li>
</ul>
</li>
<li>Looking at the Solr statistics for 2022-04
<ul>
<li>52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests</li>
<li>64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc</li>
<li>185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt</li>
<li>157.55.39.159 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
<li>52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
<li>157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
<li>207.46.13.177 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
<li>If I query Solr for <code>time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.</code> I see a handful of IPs that made 41,000 requests</li>
</ul>
</li>
<li>I purged 93,974 hits from these IPs using my <code>check-spider-ip-hits.sh</code> script</li>
</ul>April, 2022
@@ -344,286 +160,119 @@
https://alanorth.github.io/cgspace-notes/2022-03/
Tue, 01 Mar 2022 16:46:54 +0300https://alanorth.github.io/cgspace-notes/2022-03/
- <h2 id="2022-03-01">2022-03-01</h2>
-<ul>
-<li>Send Gaia the last batch of potential duplicates for items 701 to 980:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4.csv
-</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -o /tmp/2022-03-01-tac-batch4-701-980.csv
-</span></span><span style="display:flex;"><span>$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
-</span></span><span style="display:flex;"><span>$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
-</span></span></code></pre></div>
+ <h2 id="2022-03-01">2022-03-01</h2>
<ul>
<li>Send Gaia the last batch of potential duplicates for items 701 to 980:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4.csv
</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -o /tmp/2022-03-01-tac-batch4-701-980.csv
</span></span><span style="display:flex;"><span>$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
</span></span><span style="display:flex;"><span>$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
</span></span></code></pre></div>February, 2022
https://alanorth.github.io/cgspace-notes/2022-02/
Tue, 01 Feb 2022 14:06:54 +0200https://alanorth.github.io/cgspace-notes/2022-02/
- <h2 id="2022-02-01">2022-02-01</h2>
-<ul>
-<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
-<ul>
-<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
-<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
-<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
-<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-02-01">2022-02-01</h2>
<ul>
<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
<ul>
<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
</ul>
</li>
</ul>January, 2022
https://alanorth.github.io/cgspace-notes/2022-01/
Sat, 01 Jan 2022 15:20:54 +0200https://alanorth.github.io/cgspace-notes/2022-01/
- <h2 id="2022-01-01">2022-01-01</h2>
-<ul>
-<li>Start a full harvest on AReS</li>
-</ul>
+ <h2 id="2022-01-01">2022-01-01</h2>
<ul>
<li>Start a full harvest on AReS</li>
</ul>December, 2021
https://alanorth.github.io/cgspace-notes/2021-12/
Wed, 01 Dec 2021 16:07:07 +0200https://alanorth.github.io/cgspace-notes/2021-12/
- <h2 id="2021-12-01">2021-12-01</h2>
-<ul>
-<li>Atmire merged some changes I had submitted to the COUNTER-Robots project</li>
-<li>I updated our local spider user agents and then re-ran the list with my <code>check-spider-hits.sh</code> script on CGSpace:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
-</span></span><span style="display:flex;"><span>Purging 1989 hits from The Knowledge AI in statistics
-</span></span><span style="display:flex;"><span>Purging 1235 hits from MaCoCu in statistics
-</span></span><span style="display:flex;"><span>Purging 455 hits from WhatsApp in statistics
-</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
-</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 3679
-</span></span></code></pre></div>
+ <h2 id="2021-12-01">2021-12-01</h2>
<ul>
<li>Atmire merged some changes I had submitted to the COUNTER-Robots project</li>
<li>I updated our local spider user agents and then re-ran the list with my <code>check-spider-hits.sh</code> script on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
</span></span><span style="display:flex;"><span>Purging 1989 hits from The Knowledge AI in statistics
</span></span><span style="display:flex;"><span>Purging 1235 hits from MaCoCu in statistics
</span></span><span style="display:flex;"><span>Purging 455 hits from WhatsApp in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 3679
</span></span></code></pre></div>November, 2021
https://alanorth.github.io/cgspace-notes/2021-11/
Tue, 02 Nov 2021 22:27:07 +0200https://alanorth.github.io/cgspace-notes/2021-11/
- <h2 id="2021-11-02">2021-11-02</h2>
-<ul>
-<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
-<li>First I exported all the 2019 stats from CGSpace:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
-</span></span><span style="display:flex;"><span>$ zstd statistics-2019.json
-</span></span></code></pre></div>
+ <h2 id="2021-11-02">2021-11-02</h2>
<ul>
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
<li>First I exported all the 2019 stats from CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
</span></span><span style="display:flex;"><span>$ zstd statistics-2019.json
</span></span></code></pre></div>October, 2021
https://alanorth.github.io/cgspace-notes/2021-10/
Fri, 01 Oct 2021 11:14:07 +0300https://alanorth.github.io/cgspace-notes/2021-10/
- <h2 id="2021-10-01">2021-10-01</h2>
-<ul>
-<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
-</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
-</span></span><span style="display:flex;"><span>$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
-</span></span><span style="display:flex;"><span>ations-matching.csv
-</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
-</span></span><span style="display:flex;"><span>1879
-</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-10-01-affiliations.txt
-</span></span><span style="display:flex;"><span>7100 /tmp/2021-10-01-affiliations.txt
-</span></span></code></pre></div><ul>
-<li>So we have 1879/7100 (26.46%) matching already</li>
-</ul>
+ <h2 id="2021-10-01">2021-10-01</h2>
<ul>
<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
</span></span><span style="display:flex;"><span>$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
</span></span><span style="display:flex;"><span>ations-matching.csv
</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
</span></span><span style="display:flex;"><span>1879
</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-10-01-affiliations.txt
</span></span><span style="display:flex;"><span>7100 /tmp/2021-10-01-affiliations.txt
</span></span></code></pre></div><ul>
<li>So we have 1879/7100 (26.46%) matching already</li>
</ul>September, 2021
https://alanorth.github.io/cgspace-notes/2021-09/
Wed, 01 Sep 2021 09:14:07 +0300https://alanorth.github.io/cgspace-notes/2021-09/
- <h2 id="2021-09-02">2021-09-02</h2>
-<ul>
-<li>Troubleshooting the missing Altmetric scores on AReS
-<ul>
-<li>Turns out that I didn’t actually fix them last month because the check for <code>content.altmetric</code> still exists, and I can’t access the DOIs using <code>_h.source.DOI</code> for some reason</li>
-<li>I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!</li>
-<li>I will change <code>DOI</code> to <code>tomato</code> in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…</li>
-<li>Even as <code>tomato</code> I can’t access that field as <code>_h.source.tomato</code> in Angular, but it does work as a filter source… sigh</li>
-</ul>
-</li>
-<li>I’m having problems using the OpenRXV API
-<ul>
-<li>The syntax Moayad showed me last month doesn’t seem to honor the search query properly…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-09-02">2021-09-02</h2>
<ul>
<li>Troubleshooting the missing Altmetric scores on AReS
<ul>
<li>Turns out that I didn’t actually fix them last month because the check for <code>content.altmetric</code> still exists, and I can’t access the DOIs using <code>_h.source.DOI</code> for some reason</li>
<li>I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!</li>
<li>I will change <code>DOI</code> to <code>tomato</code> in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…</li>
<li>Even as <code>tomato</code> I can’t access that field as <code>_h.source.tomato</code> in Angular, but it does work as a filter source… sigh</li>
</ul>
</li>
<li>I’m having problems using the OpenRXV API
<ul>
<li>The syntax Moayad showed me last month doesn’t seem to honor the search query properly…</li>
</ul>
</li>
</ul>August, 2021
https://alanorth.github.io/cgspace-notes/2021-08/
Sun, 01 Aug 2021 09:01:07 +0300https://alanorth.github.io/cgspace-notes/2021-08/
- <h2 id="2021-08-01">2021-08-01</h2>
-<ul>
-<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
-</span></span></code></pre></div><ul>
-<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
-</ul>
+ <h2 id="2021-08-01">2021-08-01</h2>
<ul>
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
</span></span></code></pre></div><ul>
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
</ul>July, 2021
https://alanorth.github.io/cgspace-notes/2021-07/
Thu, 01 Jul 2021 08:53:07 +0300https://alanorth.github.io/cgspace-notes/2021-07/
- <h2 id="2021-07-01">2021-07-01</h2>
-<ul>
-<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
-</span></span><span style="display:flex;"><span>COPY 20994
-</span></span></code></pre></div>
+ <h2 id="2021-07-01">2021-07-01</h2>
<ul>
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 20994
</span></span></code></pre></div>June, 2021
https://alanorth.github.io/cgspace-notes/2021-06/
Tue, 01 Jun 2021 10:51:07 +0300https://alanorth.github.io/cgspace-notes/2021-06/
- <h2 id="2021-06-01">2021-06-01</h2>
-<ul>
-<li>IWMI notified me that AReS was down with an HTTP 502 error
-<ul>
-<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
-<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
-<li>I simply started it and AReS was running again:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-06-01">2021-06-01</h2>
<ul>
<li>IWMI notified me that AReS was down with an HTTP 502 error
<ul>
<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
<li>I simply started it and AReS was running again:</li>
</ul>
</li>
</ul>May, 2021
https://alanorth.github.io/cgspace-notes/2021-05/
Sun, 02 May 2021 09:50:54 +0300https://alanorth.github.io/cgspace-notes/2021-05/
- <h2 id="2021-05-01">2021-05-01</h2>
-<ul>
-<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
-<ul>
-<li>“RI/1.0”, 1337</li>
-<li>“Microsoft Office Word 2014”, 941</li>
-</ul>
-</li>
-<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
-</ul>
+ <h2 id="2021-05-01">2021-05-01</h2>
<ul>
<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
<ul>
<li>“RI/1.0”, 1337</li>
<li>“Microsoft Office Word 2014”, 941</li>
</ul>
</li>
<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
</ul>April, 2021
https://alanorth.github.io/cgspace-notes/2021-04/
Thu, 01 Apr 2021 09:50:54 +0300https://alanorth.github.io/cgspace-notes/2021-04/
- <h2 id="2021-04-01">2021-04-01</h2>
-<ul>
-<li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
-<ul>
-<li>I’m curious to see how the results compare with the results from Crossref yesterday</li>
-</ul>
-</li>
-<li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
-<ul>
-<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
-<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-04-01">2021-04-01</h2>
<ul>
<li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
<ul>
<li>I’m curious to see how the results compare with the results from Crossref yesterday</li>
</ul>
</li>
<li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
<ul>
<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
</ul>
</li>
</ul>March, 2021
https://alanorth.github.io/cgspace-notes/2021-03/
Mon, 01 Mar 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-03/
- <h2 id="2021-03-01">2021-03-01</h2>
-<ul>
-<li>Discuss some OpenRXV issues with Abdullah from CodeObia
-<ul>
-<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
-<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-03-01">2021-03-01</h2>
<ul>
<li>Discuss some OpenRXV issues with Abdullah from CodeObia
<ul>
<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
</ul>
</li>
</ul>CGSpace CG Core v2 Migration
https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
Sun, 21 Feb 2021 13:27:35 +0200https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
- <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
-<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
+ <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>February, 2021
https://alanorth.github.io/cgspace-notes/2021-02/
Mon, 01 Feb 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-02/
- <h2 id="2021-02-01">2021-02-01</h2>
-<ul>
-<li>Abenet said that CIP found more duplicate records in their export from AReS
-<ul>
-<li>I re-opened <a href="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
-<li>The shared link where the duplicates are is here: <a href="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
-</ul>
-</li>
-<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
-<li>Check the results of the AReS harvesting from last night:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
-</span></span><span style="display:flex;"><span>{
-</span></span><span style="display:flex;"><span> "count" : 100875,
-</span></span><span style="display:flex;"><span> "_shards" : {
-</span></span><span style="display:flex;"><span> "total" : 1,
-</span></span><span style="display:flex;"><span> "successful" : 1,
-</span></span><span style="display:flex;"><span> "skipped" : 0,
-</span></span><span style="display:flex;"><span> "failed" : 0
-</span></span><span style="display:flex;"><span> }
-</span></span><span style="display:flex;"><span>}
-</span></span></code></pre></div>
+ <h2 id="2021-02-01">2021-02-01</h2>
<ul>
<li>Abenet said that CIP found more duplicate records in their export from AReS
<ul>
<li>I re-opened <a href="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
<li>The shared link where the duplicates are is here: <a href="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
</ul>
</li>
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
<li>Check the results of the AReS harvesting from last night:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> "count" : 100875,
</span></span><span style="display:flex;"><span> "_shards" : {
</span></span><span style="display:flex;"><span> "total" : 1,
</span></span><span style="display:flex;"><span> "successful" : 1,
</span></span><span style="display:flex;"><span> "skipped" : 0,
</span></span><span style="display:flex;"><span> "failed" : 0
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>January, 2021
https://alanorth.github.io/cgspace-notes/2021-01/
Sun, 03 Jan 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-01/
- <h2 id="2021-01-03">2021-01-03</h2>
-<ul>
-<li>Peter notified me that some filters on AReS were broken again
-<ul>
-<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
-<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
-</ul>
-</li>
-<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
-<ul>
-<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
-<li>I adjusted it to default to 0 and added a note to the admin screen</li>
-<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
-<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-01-03">2021-01-03</h2>
<ul>
<li>Peter notified me that some filters on AReS were broken again
<ul>
<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
<ul>
<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
<li>I adjusted it to default to 0 and added a note to the admin screen</li>
<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
</ul>
</li>
</ul>December, 2020
https://alanorth.github.io/cgspace-notes/2020-12/
Tue, 01 Dec 2020 11:32:54 +0200https://alanorth.github.io/cgspace-notes/2020-12/
- <h2 id="2020-12-01">2020-12-01</h2>
-<ul>
-<li>Atmire responded about the issue with duplicate data in our Solr statistics
-<ul>
-<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
-<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
-<li>I started processing those (about 411,000 records):</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-12-01">2020-12-01</h2>
<ul>
<li>Atmire responded about the issue with duplicate data in our Solr statistics
<ul>
<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
<li>I started processing those (about 411,000 records):</li>
</ul>
</li>
</ul>CGSpace DSpace 6 Upgrade
@@ -637,252 +286,91 @@
https://alanorth.github.io/cgspace-notes/2020-11/
Sun, 01 Nov 2020 13:11:54 +0200https://alanorth.github.io/cgspace-notes/2020-11/
- <h2 id="2020-11-01">2020-11-01</h2>
-<ul>
-<li>Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
-<ul>
-<li>So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-11-01">2020-11-01</h2>
<ul>
<li>Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
<ul>
<li>So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.</li>
</ul>
</li>
</ul>October, 2020
https://alanorth.github.io/cgspace-notes/2020-10/
Tue, 06 Oct 2020 16:55:54 +0300https://alanorth.github.io/cgspace-notes/2020-10/
- <h2 id="2020-10-06">2020-10-06</h2>
-<ul>
-<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
-<ul>
-<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
-<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
-</ul>
-</li>
-<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
-<ul>
-<li>During the FlywayDB migration I got an error:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
</ul>
</li>
<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
<ul>
<li>During the FlywayDB migration I got an error:</li>
</ul>
</li>
</ul>September, 2020
https://alanorth.github.io/cgspace-notes/2020-09/
Wed, 02 Sep 2020 15:35:54 +0300https://alanorth.github.io/cgspace-notes/2020-09/
- <h2 id="2020-09-02">2020-09-02</h2>
-<ul>
-<li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li>
-<li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
-<ul>
-<li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li>
-</ul>
-</li>
-<li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li>
-<li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
-<ul>
-<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li>
-</ul>
-</li>
-<li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li>
-</ul>
+ <h2 id="2020-09-02">2020-09-02</h2>
<ul>
<li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li>
<li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
<ul>
<li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li>
</ul>
</li>
<li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li>
<li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
<ul>
<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li>
</ul>
</li>
<li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li>
</ul>August, 2020
https://alanorth.github.io/cgspace-notes/2020-08/
Sun, 02 Aug 2020 15:35:54 +0300https://alanorth.github.io/cgspace-notes/2020-08/
- <h2 id="2020-08-02">2020-08-02</h2>
-<ul>
-<li>I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their <code>cg.coverage.country</code> text values
-<ul>
-<li>It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)</li>
-<li>It implements a “force” mode too that will clear existing country codes and re-tag everything</li>
-<li>It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-08-02">2020-08-02</h2>
<ul>
<li>I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their <code>cg.coverage.country</code> text values
<ul>
<li>It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)</li>
<li>It implements a “force” mode too that will clear existing country codes and re-tag everything</li>
<li>It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…</li>
</ul>
</li>
</ul>July, 2020
https://alanorth.github.io/cgspace-notes/2020-07/
Wed, 01 Jul 2020 10:53:54 +0300https://alanorth.github.io/cgspace-notes/2020-07/
- <h2 id="2020-07-01">2020-07-01</h2>
-<ul>
-<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
-<ul>
-<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
-<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
-<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
-</ul>
-</li>
-<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
-</ul>
+ <h2 id="2020-07-01">2020-07-01</h2>
<ul>
<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
<ul>
<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
</ul>
</li>
<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
</ul>June, 2020
https://alanorth.github.io/cgspace-notes/2020-06/
Mon, 01 Jun 2020 13:55:39 +0300https://alanorth.github.io/cgspace-notes/2020-06/
- <h2 id="2020-06-01">2020-06-01</h2>
-<ul>
-<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
-<ul>
-<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
-</ul>
-</li>
-<li>In other news, I checked the statistics API on DSpace 6 and it’s working</li>
-<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
-</ul>
+ <h2 id="2020-06-01">2020-06-01</h2>
<ul>
<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
<ul>
<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
</ul>
</li>
<li>In other news, I checked the statistics API on DSpace 6 and it’s working</li>
<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
</ul>May, 2020
https://alanorth.github.io/cgspace-notes/2020-05/
Sat, 02 May 2020 09:52:04 +0300https://alanorth.github.io/cgspace-notes/2020-05/
- <h2 id="2020-05-02">2020-05-02</h2>
-<ul>
-<li>Peter said that CTA is having problems submitting an item to CGSpace
-<ul>
-<li>Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again</li>
-<li>I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-05-02">2020-05-02</h2>
<ul>
<li>Peter said that CTA is having problems submitting an item to CGSpace
<ul>
<li>Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again</li>
<li>I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)</li>
</ul>
</li>
</ul>April, 2020
https://alanorth.github.io/cgspace-notes/2020-04/
Thu, 02 Apr 2020 10:53:24 +0300https://alanorth.github.io/cgspace-notes/2020-04/
- <h2 id="2020-04-02">2020-04-02</h2>
-<ul>
-<li>Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
-<ul>
-<li>I updated the fifty-eight existing items on CGSpace</li>
-</ul>
-</li>
-<li>Looking into the items Udana had asked about last week that were missing Altmetric donuts:
-<ul>
-<li><a href="https://hdl.handle.net/10568/103225">The first</a> is still missing its DOI, so I added it and <a href="https://twitter.com/mralanorth/status/1245632619661766657">tweeted its handle</a> (after a few hours there was a donut with score 222)</li>
-<li><a href="https://hdl.handle.net/10568/106899">The second item</a> now has a donut with score 2 since I <a href="https://twitter.com/mralanorth/status/1243158045540134913">tweeted its handle</a> last week</li>
-<li><a href="https://hdl.handle.net/10568/107258">The third item</a> now has a donut with score 1 since I <a href="https://twitter.com/mralanorth/status/1243158786392625153">tweeted it</a> last week</li>
-</ul>
-</li>
-<li>On the same note, the <a href="https://hdl.handle.net/10568/106573">one item</a> Abenet pointed out last week now has a donut with score of 104 after I <a href="https://twitter.com/mralanorth/status/1243163710241345536">tweeted it</a> last week</li>
-</ul>
+ <h2 id="2020-04-02">2020-04-02</h2>
<ul>
<li>Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
<ul>
<li>I updated the fifty-eight existing items on CGSpace</li>
</ul>
</li>
<li>Looking into the items Udana had asked about last week that were missing Altmetric donuts:
<ul>
<li><a href="https://hdl.handle.net/10568/103225">The first</a> is still missing its DOI, so I added it and <a href="https://twitter.com/mralanorth/status/1245632619661766657">tweeted its handle</a> (after a few hours there was a donut with score 222)</li>
<li><a href="https://hdl.handle.net/10568/106899">The second item</a> now has a donut with score 2 since I <a href="https://twitter.com/mralanorth/status/1243158045540134913">tweeted its handle</a> last week</li>
<li><a href="https://hdl.handle.net/10568/107258">The third item</a> now has a donut with score 1 since I <a href="https://twitter.com/mralanorth/status/1243158786392625153">tweeted it</a> last week</li>
</ul>
</li>
<li>On the same note, the <a href="https://hdl.handle.net/10568/106573">one item</a> Abenet pointed out last week now has a donut with score of 104 after I <a href="https://twitter.com/mralanorth/status/1243163710241345536">tweeted it</a> last week</li>
</ul>March, 2020
https://alanorth.github.io/cgspace-notes/2020-03/
Mon, 02 Mar 2020 12:31:30 +0200https://alanorth.github.io/cgspace-notes/2020-03/
- <h2 id="2020-03-02">2020-03-02</h2>
-<ul>
-<li>Update <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
-<ul>
-<li>Tag version 1.2.0 on GitHub</li>
-</ul>
-</li>
-<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <a href="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
-<ul>
-<li>You need to download this into the DSpace 6.x source and compile it</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-03-02">2020-03-02</h2>
<ul>
<li>Update <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
<ul>
<li>Tag version 1.2.0 on GitHub</li>
</ul>
</li>
<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <a href="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
<ul>
<li>You need to download this into the DSpace 6.x source and compile it</li>
</ul>
</li>
</ul>February, 2020
https://alanorth.github.io/cgspace-notes/2020-02/
Sun, 02 Feb 2020 11:56:30 +0200https://alanorth.github.io/cgspace-notes/2020-02/
- <h2 id="2020-02-02">2020-02-02</h2>
-<ul>
-<li>Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
-<ul>
-<li>Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database</li>
-<li>I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks</li>
-<li>Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff</li>
-<li>The code finally builds and runs with a fresh install</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-02-02">2020-02-02</h2>
<ul>
<li>Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
<ul>
<li>Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database</li>
<li>I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks</li>
<li>Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff</li>
<li>The code finally builds and runs with a fresh install</li>
</ul>
</li>
</ul>January, 2020
https://alanorth.github.io/cgspace-notes/2020-01/
Mon, 06 Jan 2020 10:48:30 +0200https://alanorth.github.io/cgspace-notes/2020-01/
- <h2 id="2020-01-06">2020-01-06</h2>
-<ul>
-<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
-<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
-<ul>
-<li>The score is now linked to the DOI</li>
-<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
-<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
-</ul>
-</li>
-</ul>
-<h2 id="2020-01-07">2020-01-07</h2>
-<ul>
-<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
-<ul>
-<li>The DOI has a score of 259, but the Handle has no score at all</li>
-<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-01-06">2020-01-06</h2>
<ul>
<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
<ul>
<li>The score is now linked to the DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
</ul>
</li>
</ul>
<h2 id="2020-01-07">2020-01-07</h2>
<ul>
<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
<ul>
<li>The DOI has a score of 259, but the Handle has no score at all</li>
<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
</ul>
</li>
</ul>December, 2019
https://alanorth.github.io/cgspace-notes/2019-12/
Sun, 01 Dec 2019 11:22:30 +0200https://alanorth.github.io/cgspace-notes/2019-12/
- <h2 id="2019-12-01">2019-12-01</h2>
-<ul>
-<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
-<ul>
-<li>Check any packages that have residual configs and purge them:</li>
-<li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li>
-<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># apt update && apt full-upgrade
-# apt-get autoremove && apt-get autoclean
-# dpkg -C
-# reboot
-</code></pre>
+ <h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># apt update && apt full-upgrade
# apt-get autoremove && apt-get autoclean
# dpkg -C
# reboot
</code></pre>November, 2019
https://alanorth.github.io/cgspace-notes/2019-11/
Mon, 04 Nov 2019 12:20:30 +0200https://alanorth.github.io/cgspace-notes/2019-11/
- <h2 id="2019-11-04">2019-11-04</h2>
-<ul>
-<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
-<ul>
-<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-4671942
-# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-1277694
-</code></pre><ul>
-<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
-<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
-1183456
-# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
-106781
-</code></pre>
+ <h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
1277694
</code></pre><ul>
<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
106781
</code></pre>October, 2019
@@ -896,511 +384,168 @@
https://alanorth.github.io/cgspace-notes/2019-09/
Sun, 01 Sep 2019 10:17:51 +0300https://alanorth.github.io/cgspace-notes/2019-09/
- <h2 id="2019-09-01">2019-09-01</h2>
-<ul>
-<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
-<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 440 17.58.101.255
- 441 157.55.39.101
- 485 207.46.13.43
- 728 169.60.128.125
- 730 207.46.13.108
- 758 157.55.39.9
- 808 66.160.140.179
- 814 207.46.13.212
- 2472 163.172.71.23
- 6092 3.94.211.189
-# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 33 2a01:7e00::f03c:91ff:fe16:fcb
- 57 3.83.192.124
- 57 3.87.77.25
- 57 54.82.1.8
- 822 2a01:9cc0:47:1:1a:4:0:2
- 1223 45.5.184.72
- 1633 172.104.229.92
- 5112 205.186.128.185
- 7249 2a01:7e00::f03c:91ff:fe18:7396
- 9124 45.5.186.2
-</code></pre>
+ <h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>August, 2019
https://alanorth.github.io/cgspace-notes/2019-08/
Sat, 03 Aug 2019 12:39:51 +0300https://alanorth.github.io/cgspace-notes/2019-08/
- <h2 id="2019-08-03">2019-08-03</h2>
-<ul>
-<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
-</ul>
-<h2 id="2019-08-04">2019-08-04</h2>
-<ul>
-<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
-<li>Run system updates on CGSpace (linode18) and reboot it
-<ul>
-<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
-<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
-</ul>
-</li>
-<li>Run system updates on DSpace Test (linode19) and reboot it</li>
-</ul>
+ <h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
</ul>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
</ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>July, 2019
https://alanorth.github.io/cgspace-notes/2019-07/
Mon, 01 Jul 2019 12:13:51 +0300https://alanorth.github.io/cgspace-notes/2019-07/
- <h2 id="2019-07-01">2019-07-01</h2>
-<ul>
-<li>Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice</li>
-<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
-<ul>
-<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
-<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
-</ul>
-</li>
-<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
-</ul>
+ <h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>June, 2019
https://alanorth.github.io/cgspace-notes/2019-06/
Sun, 02 Jun 2019 10:57:51 +0300https://alanorth.github.io/cgspace-notes/2019-06/
- <h2 id="2019-06-02">2019-06-02</h2>
-<ul>
-<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
-<li>Run system updates on CGSpace (linode18) and reboot it</li>
-</ul>
-<h2 id="2019-06-03">2019-06-03</h2>
-<ul>
-<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
-</ul>
+ <h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>May, 2019
https://alanorth.github.io/cgspace-notes/2019-05/
Wed, 01 May 2019 07:37:43 +0300https://alanorth.github.io/cgspace-notes/2019-05/
- <h2 id="2019-05-01">2019-05-01</h2>
-<ul>
-<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
-<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
-<ul>
-<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
-<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
-</ul>
-</li>
-<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
-DELETE 1
-</code></pre><ul>
-<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
-</ul>
+ <h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul>
</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
</code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
</ul>April, 2019
https://alanorth.github.io/cgspace-notes/2019-04/
Mon, 01 Apr 2019 09:00:43 +0300https://alanorth.github.io/cgspace-notes/2019-04/
- <h2 id="2019-04-01">2019-04-01</h2>
-<ul>
-<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
-<ul>
-<li>They asked if we had plans to enable RDF support in CGSpace</li>
-</ul>
-</li>
-<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
-<ul>
-<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
- 4432 200
-</code></pre><ul>
-<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
-<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
-</ul>
-<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
-$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
-</code></pre>
+ <h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul>
</li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
</code></pre><ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre>March, 2019
https://alanorth.github.io/cgspace-notes/2019-03/
Fri, 01 Mar 2019 12:16:30 +0100https://alanorth.github.io/cgspace-notes/2019-03/
- <h2 id="2019-03-01">2019-03-01</h2>
-<ul>
-<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
-<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
-<li>Looking at the other half of Udana’s WLE records from 2018-11
-<ul>
-<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
-<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
-<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
-<li>68.15% � 9.45 instead of 68.15% ± 9.45</li>
-<li>2003�2013 instead of 2003–2013</li>
-</ul>
-</li>
-<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
-</ul>
+ <h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
<li>Looking at the other half of Udana’s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% � 9.45 instead of 68.15% ± 9.45</li>
<li>2003�2013 instead of 2003–2013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>February, 2019
https://alanorth.github.io/cgspace-notes/2019-02/
Fri, 01 Feb 2019 21:37:30 +0200https://alanorth.github.io/cgspace-notes/2019-02/
- <h2 id="2019-02-01">2019-02-01</h2>
-<ul>
-<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
-<li>The top IPs before, during, and after this latest alert tonight were:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 245 207.46.13.5
- 332 54.70.40.11
- 385 5.143.231.38
- 405 207.46.13.173
- 405 207.46.13.75
- 1117 66.249.66.219
- 1121 35.237.175.180
- 1546 5.9.6.51
- 2474 45.5.186.2
- 5490 85.25.237.71
-</code></pre><ul>
-<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
-<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
-<li>There were just over 3 million accesses in the nginx logs last month:</li>
-</ul>
-<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
-3018243
-
-real 0m19.873s
-user 0m22.203s
-sys 0m1.979s
-</code></pre>
+ <h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
</code></pre><ul>
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
</ul>
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
</code></pre>January, 2019
https://alanorth.github.io/cgspace-notes/2019-01/
Wed, 02 Jan 2019 09:48:30 +0200https://alanorth.github.io/cgspace-notes/2019-01/
- <h2 id="2019-01-02">2019-01-02</h2>
-<ul>
-<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
-<li>I don’t see anything interesting in the web server logs around that time though:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 92 40.77.167.4
- 99 210.7.29.100
- 120 38.126.157.45
- 177 35.237.175.180
- 177 40.77.167.32
- 216 66.249.75.219
- 225 18.203.76.93
- 261 46.101.86.248
- 357 207.46.13.1
- 903 54.70.40.11
-</code></pre>
+ <h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don’t see anything interesting in the web server logs around that time though:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>December, 2018
https://alanorth.github.io/cgspace-notes/2018-12/
Sun, 02 Dec 2018 02:09:30 +0200https://alanorth.github.io/cgspace-notes/2018-12/
- <h2 id="2018-12-01">2018-12-01</h2>
-<ul>
-<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
-<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
-<li>Then I ran all system updates and restarted the server</li>
-</ul>
-<h2 id="2018-12-02">2018-12-02</h2>
-<ul>
-<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
-</ul>
+ <h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>November, 2018
https://alanorth.github.io/cgspace-notes/2018-11/
Thu, 01 Nov 2018 16:41:30 +0200https://alanorth.github.io/cgspace-notes/2018-11/
- <h2 id="2018-11-01">2018-11-01</h2>
-<ul>
-<li>Finalize AReS Phase I and Phase II ToRs</li>
-<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
-</ul>
-<h2 id="2018-11-03">2018-11-03</h2>
-<ul>
-<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
-<li>Today these are the top 10 IPs:</li>
-</ul>
+ <h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>October, 2018
https://alanorth.github.io/cgspace-notes/2018-10/
Mon, 01 Oct 2018 22:31:54 +0300https://alanorth.github.io/cgspace-notes/2018-10/
- <h2 id="2018-10-01">2018-10-01</h2>
-<ul>
-<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
-<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
-</ul>
+ <h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
</ul>September, 2018
https://alanorth.github.io/cgspace-notes/2018-09/
Sun, 02 Sep 2018 09:55:54 +0300https://alanorth.github.io/cgspace-notes/2018-09/
- <h2 id="2018-09-02">2018-09-02</h2>
-<ul>
-<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
-<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
-<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
-<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
-</ul>
+ <h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul>August, 2018
https://alanorth.github.io/cgspace-notes/2018-08/
Wed, 01 Aug 2018 11:52:54 +0300https://alanorth.github.io/cgspace-notes/2018-08/
- <h2 id="2018-08-01">2018-08-01</h2>
-<ul>
-<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
-</ul>
-<pre tabindex="0"><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
-[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
-[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
-</code></pre><ul>
-<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
-<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
-<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
-<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
-<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
-<li>I ran all system updates on DSpace Test and rebooted it</li>
-</ul>
+ <h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre tabindex="0"><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
<li>I ran all system updates on DSpace Test and rebooted it</li>
</ul>July, 2018
https://alanorth.github.io/cgspace-notes/2018-07/
Sun, 01 Jul 2018 12:56:54 +0300https://alanorth.github.io/cgspace-notes/2018-07/
- <h2 id="2018-07-01">2018-07-01</h2>
-<ul>
-<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
-</ul>
-<pre tabindex="0"><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
-</code></pre><ul>
-<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
-</ul>
-<pre tabindex="0"><code>There is insufficient memory for the Java Runtime Environment to continue.
-</code></pre>
+ <h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
<pre tabindex="0"><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre tabindex="0"><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>June, 2018
https://alanorth.github.io/cgspace-notes/2018-06/
Mon, 04 Jun 2018 19:49:54 -0700https://alanorth.github.io/cgspace-notes/2018-06/
- <h2 id="2018-06-04">2018-06-04</h2>
-<ul>
-<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
-<ul>
-<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
-</ul>
-</li>
-<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
-<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
-</ul>
-<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
-</code></pre><ul>
-<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
-<li>Time to index ~70,000 items on CGSpace:</li>
-</ul>
-<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
-
-real 74m42.646s
-user 8m5.056s
-sys 2m7.289s
-</code></pre>
+ <h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
</ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
<li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s
user 8m5.056s
sys 2m7.289s
</code></pre>May, 2018
https://alanorth.github.io/cgspace-notes/2018-05/
Tue, 01 May 2018 16:43:54 +0300https://alanorth.github.io/cgspace-notes/2018-05/
- <h2 id="2018-05-01">2018-05-01</h2>
-<ul>
-<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
-<ul>
-<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
-<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
-</ul>
-</li>
-<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
-<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
-</ul>
+ <h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>April, 2018
https://alanorth.github.io/cgspace-notes/2018-04/
Sun, 01 Apr 2018 16:13:54 +0200https://alanorth.github.io/cgspace-notes/2018-04/
- <h2 id="2018-04-01">2018-04-01</h2>
-<ul>
-<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
-<li>Catalina logs at least show some memory errors yesterday:</li>
-</ul>
+ <h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
</ul>March, 2018
https://alanorth.github.io/cgspace-notes/2018-03/
Fri, 02 Mar 2018 16:07:54 +0200https://alanorth.github.io/cgspace-notes/2018-03/
- <h2 id="2018-03-02">2018-03-02</h2>
-<ul>
-<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
-</ul>
+ <h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>February, 2018
https://alanorth.github.io/cgspace-notes/2018-02/
Thu, 01 Feb 2018 16:28:54 +0200https://alanorth.github.io/cgspace-notes/2018-02/
- <h2 id="2018-02-01">2018-02-01</h2>
-<ul>
-<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
-<li>We don’t need to distinguish between internal and external works, so that makes it just a simple list</li>
-<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
-<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu’s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li>
-</ul>
+ <h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don’t need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu’s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul>January, 2018
https://alanorth.github.io/cgspace-notes/2018-01/
Tue, 02 Jan 2018 08:35:54 -0800https://alanorth.github.io/cgspace-notes/2018-01/
- <h2 id="2018-01-02">2018-01-02</h2>
-<ul>
-<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
-<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
-<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
-<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
-<li>And just before that I see this:</li>
-</ul>
-<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-</code></pre><ul>
-<li>Ah hah! So the pool was actually empty!</li>
-<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
-<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
-<li>I notice this error quite a few times in dspace.log:</li>
-</ul>
-<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-</code></pre><ul>
-<li>And there are many of these errors every day for the past month:</li>
-</ul>
-<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-</code></pre><ul>
-<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
-</ul>
+ <h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
<li>And just before that I see this:</li>
</ul>
<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
<li>I notice this error quite a few times in dspace.log:</li>
</ul>
<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
</code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
</ul>
<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
dspace.log.2017-11-24:11
dspace.log.2017-11-25:0
dspace.log.2017-11-26:1
dspace.log.2017-11-27:7
dspace.log.2017-11-28:21
dspace.log.2017-11-29:31
dspace.log.2017-11-30:15
dspace.log.2017-12-01:15
dspace.log.2017-12-02:20
dspace.log.2017-12-03:38
dspace.log.2017-12-04:65
dspace.log.2017-12-05:43
dspace.log.2017-12-06:72
dspace.log.2017-12-07:27
dspace.log.2017-12-08:15
dspace.log.2017-12-09:29
dspace.log.2017-12-10:35
dspace.log.2017-12-11:20
dspace.log.2017-12-12:44
dspace.log.2017-12-13:36
dspace.log.2017-12-14:59
dspace.log.2017-12-15:104
dspace.log.2017-12-16:53
dspace.log.2017-12-17:66
dspace.log.2017-12-18:83
dspace.log.2017-12-19:101
dspace.log.2017-12-20:74
dspace.log.2017-12-21:55
dspace.log.2017-12-22:66
dspace.log.2017-12-23:50
dspace.log.2017-12-24:85
dspace.log.2017-12-25:62
dspace.log.2017-12-26:49
dspace.log.2017-12-27:30
dspace.log.2017-12-28:54
dspace.log.2017-12-29:68
dspace.log.2017-12-30:89
dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
</ul>December, 2017
https://alanorth.github.io/cgspace-notes/2017-12/
Fri, 01 Dec 2017 13:53:54 +0300https://alanorth.github.io/cgspace-notes/2017-12/
- <h2 id="2017-12-01">2017-12-01</h2>
-<ul>
-<li>Uptime Robot noticed that CGSpace went down</li>
-<li>The logs say “Timeout waiting for idle object”</li>
-<li>PostgreSQL activity says there are 115 connections currently</li>
-<li>The list of connections to XMLUI and REST API for today:</li>
-</ul>
+ <h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say “Timeout waiting for idle object”</li>
<li>PostgreSQL activity says there are 115 connections currently</li>
<li>The list of connections to XMLUI and REST API for today:</li>
</ul>November, 2017
https://alanorth.github.io/cgspace-notes/2017-11/
Thu, 02 Nov 2017 09:37:54 +0200https://alanorth.github.io/cgspace-notes/2017-11/
- <h2 id="2017-11-01">2017-11-01</h2>
-<ul>
-<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
-</ul>
-<h2 id="2017-11-02">2017-11-02</h2>
-<ul>
-<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
-</ul>
-<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
-0
-</code></pre><ul>
-<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-</code></pre>
+ <h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
0
</code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
</ul>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
</code></pre>October, 2017
https://alanorth.github.io/cgspace-notes/2017-10/
Sun, 01 Oct 2017 08:07:54 +0300https://alanorth.github.io/cgspace-notes/2017-10/
- <h2 id="2017-10-01">2017-10-01</h2>
-<ul>
-<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
-</ul>
-<pre tabindex="0"><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-</code></pre><ul>
-<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
-<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
-</ul>
+ <h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre tabindex="0"><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre><ul>
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
</ul>CGIAR Library Migration
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index e041c380a..8e66d3860 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index dfb860af2..9fae91c3a 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index b473aff92..357abb67e 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 1be42d10b..eec5713f8 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index a7abbfdbd..6de2769e4 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index fd012dde5..e12053d7f 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/categories/notes/page/8/index.html b/docs/categories/notes/page/8/index.html
index af69bdd89..412a70608 100644
--- a/docs/categories/notes/page/8/index.html
+++ b/docs/categories/notes/page/8/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/cgiar-library-migration/index.html b/docs/cgiar-library-migration/index.html
index 7c83f0f96..bf3a53806 100644
--- a/docs/cgiar-library-migration/index.html
+++ b/docs/cgiar-library-migration/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/cgspace-cgcorev2-migration/index.html b/docs/cgspace-cgcorev2-migration/index.html
index 6b984f21c..3c6263053 100644
--- a/docs/cgspace-cgcorev2-migration/index.html
+++ b/docs/cgspace-cgcorev2-migration/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/cgspace-dspace6-upgrade/index.html b/docs/cgspace-dspace6-upgrade/index.html
index 74fce05f5..a1b5ee96f 100644
--- a/docs/cgspace-dspace6-upgrade/index.html
+++ b/docs/cgspace-dspace6-upgrade/index.html
@@ -18,7 +18,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 8f49ce385..17ccb2e80 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/index.xml b/docs/index.xml
index 84e82a6f7..f6d9a0877 100644
--- a/docs/index.xml
+++ b/docs/index.xml
@@ -20,61 +20,28 @@
https://alanorth.github.io/cgspace-notes/2023-11/
Thu, 02 Nov 2023 12:59:36 +0300https://alanorth.github.io/cgspace-notes/2023-11/
- <h2 id="2023-11-01">2023-11-01</h2>
-<ul>
-<li>Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
-<ul>
-<li>I improved the filtering and wrote some Python using pandas to merge my sources more reliably</li>
-</ul>
-</li>
-</ul>
-<h2 id="2023-11-02">2023-11-02</h2>
-<ul>
-<li>Export CGSpace to check missing Initiative collection mappings</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-11-01">2023-11-01</h2>
<ul>
<li>Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
<ul>
<li>I improved the filtering and wrote some Python using pandas to merge my sources more reliably</li>
</ul>
</li>
</ul>
<h2 id="2023-11-02">2023-11-02</h2>
<ul>
<li>Export CGSpace to check missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>October, 2023
https://alanorth.github.io/cgspace-notes/2023-10/
Mon, 02 Oct 2023 09:05:36 +0300https://alanorth.github.io/cgspace-notes/2023-10/
- <h2 id="2023-10-02">2023-10-02</h2>
-<ul>
-<li>Export CGSpace to check DOIs against Crossref
-<ul>
-<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref’s metadata is in the public domain under the CC0 license</a></li>
-<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
-<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-10-02">2023-10-02</h2>
<ul>
<li>Export CGSpace to check DOIs against Crossref
<ul>
<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref’s metadata is in the public domain under the CC0 license</a></li>
<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
</ul>
</li>
</ul>September, 2023
https://alanorth.github.io/cgspace-notes/2023-09/
Sat, 02 Sep 2023 17:29:36 +0300https://alanorth.github.io/cgspace-notes/2023-09/
- <h2 id="2023-09-02">2023-09-02</h2>
-<ul>
-<li>Export CGSpace to check for missing Initiative collection mappings</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-09-02">2023-09-02</h2>
<ul>
<li>Export CGSpace to check for missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>August, 2023
https://alanorth.github.io/cgspace-notes/2023-08/
Thu, 03 Aug 2023 11:18:36 +0300https://alanorth.github.io/cgspace-notes/2023-08/
- <h2 id="2023-08-03">2023-08-03</h2>
-<ul>
-<li>I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
-<ul>
-<li>I did some minor cleanups myself and applied them to CGSpace</li>
-</ul>
-</li>
-<li>Start working on some batch uploads for IFPRI</li>
-</ul>
+ <h2 id="2023-08-03">2023-08-03</h2>
<ul>
<li>I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
<ul>
<li>I did some minor cleanups myself and applied them to CGSpace</li>
</ul>
</li>
<li>Start working on some batch uploads for IFPRI</li>
</ul>July, 2023
@@ -88,249 +55,98 @@
https://alanorth.github.io/cgspace-notes/2023-06/
Fri, 02 Jun 2023 10:29:36 +0300https://alanorth.github.io/cgspace-notes/2023-06/
- <h2 id="2023-06-02">2023-06-02</h2>
-<ul>
-<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
-<ul>
-<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
-</ul>
-</li>
-<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
-<ul>
-<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
-<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-06-02">2023-06-02</h2>
<ul>
<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
<ul>
<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
</ul>
</li>
<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
<ul>
<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
</ul>
</li>
</ul>May, 2023
https://alanorth.github.io/cgspace-notes/2023-05/
Wed, 03 May 2023 08:53:36 +0300https://alanorth.github.io/cgspace-notes/2023-05/
- <h2 id="2023-05-03">2023-05-03</h2>
-<ul>
-<li>Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
-<ul>
-<li>It seems their password expired, which is annoying</li>
-</ul>
-</li>
-<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
-<ul>
-<li>There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…</li>
-<li>Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly</li>
-</ul>
-</li>
-<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
-</ul>
+ <h2 id="2023-05-03">2023-05-03</h2>
<ul>
<li>Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
<ul>
<li>It seems their password expired, which is annoying</li>
</ul>
</li>
<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
<ul>
<li>There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…</li>
<li>Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly</li>
</ul>
</li>
<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
</ul>April, 2023
https://alanorth.github.io/cgspace-notes/2023-04/
Sun, 02 Apr 2023 08:19:36 +0300https://alanorth.github.io/cgspace-notes/2023-04/
- <h2 id="2023-04-02">2023-04-02</h2>
-<ul>
-<li>Run all system updates on CGSpace and reboot it</li>
-<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
-<ul>
-<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
-</ul>
-</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-04-02">2023-04-02</h2>
<ul>
<li>Run all system updates on CGSpace and reboot it</li>
<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
<ul>
<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>March, 2023
https://alanorth.github.io/cgspace-notes/2023-03/
Wed, 01 Mar 2023 07:58:36 +0300https://alanorth.github.io/cgspace-notes/2023-03/
- <h2 id="2023-03-01">2023-03-01</h2>
-<ul>
-<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
-<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
-<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
-</ul>
+ <h2 id="2023-03-01">2023-03-01</h2>
<ul>
<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
</ul>February, 2023
https://alanorth.github.io/cgspace-notes/2023-02/
Wed, 01 Feb 2023 10:57:36 +0300https://alanorth.github.io/cgspace-notes/2023-02/
- <h2 id="2023-02-01">2023-02-01</h2>
-<ul>
-<li>Export CGSpace to cross check the DOI metadata with Crossref
-<ul>
-<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-02-01">2023-02-01</h2>
<ul>
<li>Export CGSpace to cross check the DOI metadata with Crossref
<ul>
<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…</li>
</ul>
</li>
</ul>January, 2023
https://alanorth.github.io/cgspace-notes/2023-01/
Sun, 01 Jan 2023 08:44:36 +0300https://alanorth.github.io/cgspace-notes/2023-01/
- <h2 id="2023-01-01">2023-01-01</h2>
-<ul>
-<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
-<ul>
-<li>I want to update all ORCID names and refresh them in the database</li>
-<li>I see we have some new ones that aren’t in our list if I combine with this file:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-01-01">2023-01-01</h2>
<ul>
<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
<ul>
<li>I want to update all ORCID names and refresh them in the database</li>
<li>I see we have some new ones that aren’t in our list if I combine with this file:</li>
</ul>
</li>
</ul>December, 2022
https://alanorth.github.io/cgspace-notes/2022-12/
Thu, 01 Dec 2022 08:52:36 +0300https://alanorth.github.io/cgspace-notes/2022-12/
- <h2 id="2022-12-01">2022-12-01</h2>
-<ul>
-<li>Fix some incorrect regions on CGSpace
-<ul>
-<li>I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions</li>
-</ul>
-</li>
-<li>Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!</li>
-<li>Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)</li>
-</ul>
+ <h2 id="2022-12-01">2022-12-01</h2>
<ul>
<li>Fix some incorrect regions on CGSpace
<ul>
<li>I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions</li>
</ul>
</li>
<li>Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!</li>
<li>Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)</li>
</ul>November, 2022
https://alanorth.github.io/cgspace-notes/2022-11/
Tue, 01 Nov 2022 09:11:36 +0300https://alanorth.github.io/cgspace-notes/2022-11/
- <h2 id="2022-11-01">2022-11-01</h2>
-<ul>
-<li>Last night I re-synced DSpace 7 Test from CGSpace
-<ul>
-<li>I also updated all my local <code>7_x-dev</code> branches on the latest upstreams</li>
-</ul>
-</li>
-<li>I spent some time updating the authorizations in Alliance collections
-<ul>
-<li>I want to make sure they use groups instead of individuals where possible!</li>
-</ul>
-</li>
-<li>I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue</li>
-</ul>
+ <h2 id="2022-11-01">2022-11-01</h2>
<ul>
<li>Last night I re-synced DSpace 7 Test from CGSpace
<ul>
<li>I also updated all my local <code>7_x-dev</code> branches on the latest upstreams</li>
</ul>
</li>
<li>I spent some time updating the authorizations in Alliance collections
<ul>
<li>I want to make sure they use groups instead of individuals where possible!</li>
</ul>
</li>
<li>I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue</li>
</ul>October, 2022
https://alanorth.github.io/cgspace-notes/2022-10/
Sat, 01 Oct 2022 19:45:36 +0300https://alanorth.github.io/cgspace-notes/2022-10/
- <h2 id="2022-10-01">2022-10-01</h2>
-<ul>
-<li>Start a harvest on AReS last night</li>
-<li>Yesterday I realized how to use <a href="https://im4java.sourceforge.net/docs/dev-guide.html">GraphicsMagick with im4java</a> and I want to re-visit some of my thumbnail tests
-<ul>
-<li>I’m also interested in libvips support via jVips, though last time I checked it was only for Java 8</li>
-<li>I filed <a href="https://github.com/criteo/JVips/issues/141">an issue to ask about Java 11+ support</a></li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-10-01">2022-10-01</h2>
<ul>
<li>Start a harvest on AReS last night</li>
<li>Yesterday I realized how to use <a href="https://im4java.sourceforge.net/docs/dev-guide.html">GraphicsMagick with im4java</a> and I want to re-visit some of my thumbnail tests
<ul>
<li>I’m also interested in libvips support via jVips, though last time I checked it was only for Java 8</li>
<li>I filed <a href="https://github.com/criteo/JVips/issues/141">an issue to ask about Java 11+ support</a></li>
</ul>
</li>
</ul>September, 2022
https://alanorth.github.io/cgspace-notes/2022-09/
Thu, 01 Sep 2022 09:41:36 +0300https://alanorth.github.io/cgspace-notes/2022-09/
- <h2 id="2022-09-01">2022-09-01</h2>
-<ul>
-<li>A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet</li>
-<li>I tested an item submission on DSpace Test with the Cocoon <code>org.apache.cocoon.uploads.autosave=false</code> change
-<ul>
-<li>The submission works as expected</li>
-</ul>
-</li>
-<li>Start debugging some region-related issues with csv-metadata-quality
-<ul>
-<li>I created a new test file <code>test-geography.csv</code> with some different scenarios</li>
-<li>I also fixed a few bugs and improved the region-matching logic</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-09-01">2022-09-01</h2>
<ul>
<li>A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet</li>
<li>I tested an item submission on DSpace Test with the Cocoon <code>org.apache.cocoon.uploads.autosave=false</code> change
<ul>
<li>The submission works as expected</li>
</ul>
</li>
<li>Start debugging some region-related issues with csv-metadata-quality
<ul>
<li>I created a new test file <code>test-geography.csv</code> with some different scenarios</li>
<li>I also fixed a few bugs and improved the region-matching logic</li>
</ul>
</li>
</ul>August, 2022
https://alanorth.github.io/cgspace-notes/2022-08/
Mon, 01 Aug 2022 10:22:36 +0300https://alanorth.github.io/cgspace-notes/2022-08/
- <h2 id="2022-08-01">2022-08-01</h2>
-<ul>
-<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
-</ul>
+ <h2 id="2022-08-01">2022-08-01</h2>
<ul>
<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
</ul>July, 2022
https://alanorth.github.io/cgspace-notes/2022-07/
Sat, 02 Jul 2022 14:07:36 +0300https://alanorth.github.io/cgspace-notes/2022-07/
- <h2 id="2022-07-02">2022-07-02</h2>
-<ul>
-<li>I learned how to use the Levenshtein functions in PostgreSQL
-<ul>
-<li>The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing</li>
-<li>Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-07-02">2022-07-02</h2>
<ul>
<li>I learned how to use the Levenshtein functions in PostgreSQL
<ul>
<li>The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing</li>
<li>Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first</li>
</ul>
</li>
</ul>June, 2022
https://alanorth.github.io/cgspace-notes/2022-06/
Mon, 06 Jun 2022 09:01:36 +0300https://alanorth.github.io/cgspace-notes/2022-06/
- <h2 id="2022-06-06">2022-06-06</h2>
-<ul>
-<li>Look at the Solr statistics on CGSpace
-<ul>
-<li>I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query <code>dns:*msnbot* AND dns:*.msn.com</code></li>
-<li>I purged these first so I could see the other “real” IPs in the Solr facets</li>
-</ul>
-</li>
-<li>I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent</li>
-<li>I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent</li>
-<li>I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
-<ul>
-<li>There seem to be many more of these:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-06-06">2022-06-06</h2>
<ul>
<li>Look at the Solr statistics on CGSpace
<ul>
<li>I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query <code>dns:*msnbot* AND dns:*.msn.com</code></li>
<li>I purged these first so I could see the other “real” IPs in the Solr facets</li>
</ul>
</li>
<li>I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent</li>
<li>I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent</li>
<li>I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
<ul>
<li>There seem to be many more of these:</li>
</ul>
</li>
</ul>May, 2022
https://alanorth.github.io/cgspace-notes/2022-05/
Wed, 04 May 2022 09:13:39 +0300https://alanorth.github.io/cgspace-notes/2022-05/
- <h2 id="2022-05-04">2022-05-04</h2>
-<ul>
-<li>I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
-<ul>
-<li>18.207.136.176</li>
-<li>185.189.36.248</li>
-<li>50.118.223.78</li>
-<li>52.70.76.123</li>
-<li>3.236.10.11</li>
-</ul>
-</li>
-<li>Looking at the Solr statistics for 2022-04
-<ul>
-<li>52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests</li>
-<li>64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc</li>
-<li>185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt</li>
-<li>157.55.39.159 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
-<li>52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
-<li>157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
-<li>207.46.13.177 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
-<li>If I query Solr for <code>time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.</code> I see a handful of IPs that made 41,000 requests</li>
-</ul>
-</li>
-<li>I purged 93,974 hits from these IPs using my <code>check-spider-ip-hits.sh</code> script</li>
-</ul>
+ <h2 id="2022-05-04">2022-05-04</h2>
<ul>
<li>I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
<ul>
<li>18.207.136.176</li>
<li>185.189.36.248</li>
<li>50.118.223.78</li>
<li>52.70.76.123</li>
<li>3.236.10.11</li>
</ul>
</li>
<li>Looking at the Solr statistics for 2022-04
<ul>
<li>52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests</li>
<li>64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc</li>
<li>185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt</li>
<li>157.55.39.159 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
<li>52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
<li>157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
<li>207.46.13.177 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
<li>If I query Solr for <code>time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.</code> I see a handful of IPs that made 41,000 requests</li>
</ul>
</li>
<li>I purged 93,974 hits from these IPs using my <code>check-spider-ip-hits.sh</code> script</li>
</ul>April, 2022
@@ -344,286 +160,119 @@
https://alanorth.github.io/cgspace-notes/2022-03/
Tue, 01 Mar 2022 16:46:54 +0300https://alanorth.github.io/cgspace-notes/2022-03/
- <h2 id="2022-03-01">2022-03-01</h2>
-<ul>
-<li>Send Gaia the last batch of potential duplicates for items 701 to 980:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4.csv
-</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -o /tmp/2022-03-01-tac-batch4-701-980.csv
-</span></span><span style="display:flex;"><span>$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
-</span></span><span style="display:flex;"><span>$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
-</span></span></code></pre></div>
+ <h2 id="2022-03-01">2022-03-01</h2>
<ul>
<li>Send Gaia the last batch of potential duplicates for items 701 to 980:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4.csv
</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -o /tmp/2022-03-01-tac-batch4-701-980.csv
</span></span><span style="display:flex;"><span>$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
</span></span><span style="display:flex;"><span>$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
</span></span></code></pre></div>February, 2022
https://alanorth.github.io/cgspace-notes/2022-02/
Tue, 01 Feb 2022 14:06:54 +0200https://alanorth.github.io/cgspace-notes/2022-02/
- <h2 id="2022-02-01">2022-02-01</h2>
-<ul>
-<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
-<ul>
-<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
-<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
-<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
-<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-02-01">2022-02-01</h2>
<ul>
<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
<ul>
<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
</ul>
</li>
</ul>January, 2022
https://alanorth.github.io/cgspace-notes/2022-01/
Sat, 01 Jan 2022 15:20:54 +0200https://alanorth.github.io/cgspace-notes/2022-01/
- <h2 id="2022-01-01">2022-01-01</h2>
-<ul>
-<li>Start a full harvest on AReS</li>
-</ul>
+ <h2 id="2022-01-01">2022-01-01</h2>
<ul>
<li>Start a full harvest on AReS</li>
</ul>December, 2021
https://alanorth.github.io/cgspace-notes/2021-12/
Wed, 01 Dec 2021 16:07:07 +0200https://alanorth.github.io/cgspace-notes/2021-12/
- <h2 id="2021-12-01">2021-12-01</h2>
-<ul>
-<li>Atmire merged some changes I had submitted to the COUNTER-Robots project</li>
-<li>I updated our local spider user agents and then re-ran the list with my <code>check-spider-hits.sh</code> script on CGSpace:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
-</span></span><span style="display:flex;"><span>Purging 1989 hits from The Knowledge AI in statistics
-</span></span><span style="display:flex;"><span>Purging 1235 hits from MaCoCu in statistics
-</span></span><span style="display:flex;"><span>Purging 455 hits from WhatsApp in statistics
-</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
-</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 3679
-</span></span></code></pre></div>
+ <h2 id="2021-12-01">2021-12-01</h2>
<ul>
<li>Atmire merged some changes I had submitted to the COUNTER-Robots project</li>
<li>I updated our local spider user agents and then re-ran the list with my <code>check-spider-hits.sh</code> script on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
</span></span><span style="display:flex;"><span>Purging 1989 hits from The Knowledge AI in statistics
</span></span><span style="display:flex;"><span>Purging 1235 hits from MaCoCu in statistics
</span></span><span style="display:flex;"><span>Purging 455 hits from WhatsApp in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 3679
</span></span></code></pre></div>November, 2021
https://alanorth.github.io/cgspace-notes/2021-11/
Tue, 02 Nov 2021 22:27:07 +0200https://alanorth.github.io/cgspace-notes/2021-11/
- <h2 id="2021-11-02">2021-11-02</h2>
-<ul>
-<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
-<li>First I exported all the 2019 stats from CGSpace:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
-</span></span><span style="display:flex;"><span>$ zstd statistics-2019.json
-</span></span></code></pre></div>
+ <h2 id="2021-11-02">2021-11-02</h2>
<ul>
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
<li>First I exported all the 2019 stats from CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
</span></span><span style="display:flex;"><span>$ zstd statistics-2019.json
</span></span></code></pre></div>October, 2021
https://alanorth.github.io/cgspace-notes/2021-10/
Fri, 01 Oct 2021 11:14:07 +0300https://alanorth.github.io/cgspace-notes/2021-10/
- <h2 id="2021-10-01">2021-10-01</h2>
-<ul>
-<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
-</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
-</span></span><span style="display:flex;"><span>$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
-</span></span><span style="display:flex;"><span>ations-matching.csv
-</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
-</span></span><span style="display:flex;"><span>1879
-</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-10-01-affiliations.txt
-</span></span><span style="display:flex;"><span>7100 /tmp/2021-10-01-affiliations.txt
-</span></span></code></pre></div><ul>
-<li>So we have 1879/7100 (26.46%) matching already</li>
-</ul>
+ <h2 id="2021-10-01">2021-10-01</h2>
<ul>
<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
</span></span><span style="display:flex;"><span>$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
</span></span><span style="display:flex;"><span>ations-matching.csv
</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
</span></span><span style="display:flex;"><span>1879
</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-10-01-affiliations.txt
</span></span><span style="display:flex;"><span>7100 /tmp/2021-10-01-affiliations.txt
</span></span></code></pre></div><ul>
<li>So we have 1879/7100 (26.46%) matching already</li>
</ul>September, 2021
https://alanorth.github.io/cgspace-notes/2021-09/
Wed, 01 Sep 2021 09:14:07 +0300https://alanorth.github.io/cgspace-notes/2021-09/
- <h2 id="2021-09-02">2021-09-02</h2>
-<ul>
-<li>Troubleshooting the missing Altmetric scores on AReS
-<ul>
-<li>Turns out that I didn’t actually fix them last month because the check for <code>content.altmetric</code> still exists, and I can’t access the DOIs using <code>_h.source.DOI</code> for some reason</li>
-<li>I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!</li>
-<li>I will change <code>DOI</code> to <code>tomato</code> in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…</li>
-<li>Even as <code>tomato</code> I can’t access that field as <code>_h.source.tomato</code> in Angular, but it does work as a filter source… sigh</li>
-</ul>
-</li>
-<li>I’m having problems using the OpenRXV API
-<ul>
-<li>The syntax Moayad showed me last month doesn’t seem to honor the search query properly…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-09-02">2021-09-02</h2>
<ul>
<li>Troubleshooting the missing Altmetric scores on AReS
<ul>
<li>Turns out that I didn’t actually fix them last month because the check for <code>content.altmetric</code> still exists, and I can’t access the DOIs using <code>_h.source.DOI</code> for some reason</li>
<li>I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!</li>
<li>I will change <code>DOI</code> to <code>tomato</code> in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…</li>
<li>Even as <code>tomato</code> I can’t access that field as <code>_h.source.tomato</code> in Angular, but it does work as a filter source… sigh</li>
</ul>
</li>
<li>I’m having problems using the OpenRXV API
<ul>
<li>The syntax Moayad showed me last month doesn’t seem to honor the search query properly…</li>
</ul>
</li>
</ul>August, 2021
https://alanorth.github.io/cgspace-notes/2021-08/
Sun, 01 Aug 2021 09:01:07 +0300https://alanorth.github.io/cgspace-notes/2021-08/
- <h2 id="2021-08-01">2021-08-01</h2>
-<ul>
-<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
-</span></span></code></pre></div><ul>
-<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
-</ul>
+ <h2 id="2021-08-01">2021-08-01</h2>
<ul>
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
</span></span></code></pre></div><ul>
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
</ul>July, 2021
https://alanorth.github.io/cgspace-notes/2021-07/
Thu, 01 Jul 2021 08:53:07 +0300https://alanorth.github.io/cgspace-notes/2021-07/
- <h2 id="2021-07-01">2021-07-01</h2>
-<ul>
-<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
-</span></span><span style="display:flex;"><span>COPY 20994
-</span></span></code></pre></div>
+ <h2 id="2021-07-01">2021-07-01</h2>
<ul>
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 20994
</span></span></code></pre></div>June, 2021
https://alanorth.github.io/cgspace-notes/2021-06/
Tue, 01 Jun 2021 10:51:07 +0300https://alanorth.github.io/cgspace-notes/2021-06/
- <h2 id="2021-06-01">2021-06-01</h2>
-<ul>
-<li>IWMI notified me that AReS was down with an HTTP 502 error
-<ul>
-<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
-<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
-<li>I simply started it and AReS was running again:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-06-01">2021-06-01</h2>
<ul>
<li>IWMI notified me that AReS was down with an HTTP 502 error
<ul>
<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
<li>I simply started it and AReS was running again:</li>
</ul>
</li>
</ul>May, 2021
https://alanorth.github.io/cgspace-notes/2021-05/
Sun, 02 May 2021 09:50:54 +0300https://alanorth.github.io/cgspace-notes/2021-05/
- <h2 id="2021-05-01">2021-05-01</h2>
-<ul>
-<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
-<ul>
-<li>“RI/1.0”, 1337</li>
-<li>“Microsoft Office Word 2014”, 941</li>
-</ul>
-</li>
-<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
-</ul>
+ <h2 id="2021-05-01">2021-05-01</h2>
<ul>
<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
<ul>
<li>“RI/1.0”, 1337</li>
<li>“Microsoft Office Word 2014”, 941</li>
</ul>
</li>
<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
</ul>April, 2021
https://alanorth.github.io/cgspace-notes/2021-04/
Thu, 01 Apr 2021 09:50:54 +0300https://alanorth.github.io/cgspace-notes/2021-04/
- <h2 id="2021-04-01">2021-04-01</h2>
-<ul>
-<li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
-<ul>
-<li>I’m curious to see how the results compare with the results from Crossref yesterday</li>
-</ul>
-</li>
-<li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
-<ul>
-<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
-<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-04-01">2021-04-01</h2>
<ul>
<li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
<ul>
<li>I’m curious to see how the results compare with the results from Crossref yesterday</li>
</ul>
</li>
<li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
<ul>
<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
</ul>
</li>
</ul>March, 2021
https://alanorth.github.io/cgspace-notes/2021-03/
Mon, 01 Mar 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-03/
- <h2 id="2021-03-01">2021-03-01</h2>
-<ul>
-<li>Discuss some OpenRXV issues with Abdullah from CodeObia
-<ul>
-<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
-<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-03-01">2021-03-01</h2>
<ul>
<li>Discuss some OpenRXV issues with Abdullah from CodeObia
<ul>
<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
</ul>
</li>
</ul>CGSpace CG Core v2 Migration
https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
Sun, 21 Feb 2021 13:27:35 +0200https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
- <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
-<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
+ <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>February, 2021
https://alanorth.github.io/cgspace-notes/2021-02/
Mon, 01 Feb 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-02/
- <h2 id="2021-02-01">2021-02-01</h2>
-<ul>
-<li>Abenet said that CIP found more duplicate records in their export from AReS
-<ul>
-<li>I re-opened <a href="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
-<li>The shared link where the duplicates are is here: <a href="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
-</ul>
-</li>
-<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
-<li>Check the results of the AReS harvesting from last night:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
-</span></span><span style="display:flex;"><span>{
-</span></span><span style="display:flex;"><span> "count" : 100875,
-</span></span><span style="display:flex;"><span> "_shards" : {
-</span></span><span style="display:flex;"><span> "total" : 1,
-</span></span><span style="display:flex;"><span> "successful" : 1,
-</span></span><span style="display:flex;"><span> "skipped" : 0,
-</span></span><span style="display:flex;"><span> "failed" : 0
-</span></span><span style="display:flex;"><span> }
-</span></span><span style="display:flex;"><span>}
-</span></span></code></pre></div>
+ <h2 id="2021-02-01">2021-02-01</h2>
<ul>
<li>Abenet said that CIP found more duplicate records in their export from AReS
<ul>
<li>I re-opened <a href="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
<li>The shared link where the duplicates are is here: <a href="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
</ul>
</li>
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
<li>Check the results of the AReS harvesting from last night:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> "count" : 100875,
</span></span><span style="display:flex;"><span> "_shards" : {
</span></span><span style="display:flex;"><span> "total" : 1,
</span></span><span style="display:flex;"><span> "successful" : 1,
</span></span><span style="display:flex;"><span> "skipped" : 0,
</span></span><span style="display:flex;"><span> "failed" : 0
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>January, 2021
https://alanorth.github.io/cgspace-notes/2021-01/
Sun, 03 Jan 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-01/
- <h2 id="2021-01-03">2021-01-03</h2>
-<ul>
-<li>Peter notified me that some filters on AReS were broken again
-<ul>
-<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
-<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
-</ul>
-</li>
-<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
-<ul>
-<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
-<li>I adjusted it to default to 0 and added a note to the admin screen</li>
-<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
-<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-01-03">2021-01-03</h2>
<ul>
<li>Peter notified me that some filters on AReS were broken again
<ul>
<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
<ul>
<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
<li>I adjusted it to default to 0 and added a note to the admin screen</li>
<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
</ul>
</li>
</ul>December, 2020
https://alanorth.github.io/cgspace-notes/2020-12/
Tue, 01 Dec 2020 11:32:54 +0200https://alanorth.github.io/cgspace-notes/2020-12/
- <h2 id="2020-12-01">2020-12-01</h2>
-<ul>
-<li>Atmire responded about the issue with duplicate data in our Solr statistics
-<ul>
-<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
-<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
-<li>I started processing those (about 411,000 records):</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-12-01">2020-12-01</h2>
<ul>
<li>Atmire responded about the issue with duplicate data in our Solr statistics
<ul>
<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
<li>I started processing those (about 411,000 records):</li>
</ul>
</li>
</ul>CGSpace DSpace 6 Upgrade
@@ -637,252 +286,91 @@
https://alanorth.github.io/cgspace-notes/2020-11/
Sun, 01 Nov 2020 13:11:54 +0200https://alanorth.github.io/cgspace-notes/2020-11/
- <h2 id="2020-11-01">2020-11-01</h2>
-<ul>
-<li>Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
-<ul>
-<li>So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-11-01">2020-11-01</h2>
<ul>
<li>Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
<ul>
<li>So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.</li>
</ul>
</li>
</ul>October, 2020
https://alanorth.github.io/cgspace-notes/2020-10/
Tue, 06 Oct 2020 16:55:54 +0300https://alanorth.github.io/cgspace-notes/2020-10/
- <h2 id="2020-10-06">2020-10-06</h2>
-<ul>
-<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
-<ul>
-<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
-<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
-</ul>
-</li>
-<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
-<ul>
-<li>During the FlywayDB migration I got an error:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
</ul>
</li>
<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
<ul>
<li>During the FlywayDB migration I got an error:</li>
</ul>
</li>
</ul>September, 2020
https://alanorth.github.io/cgspace-notes/2020-09/
Wed, 02 Sep 2020 15:35:54 +0300https://alanorth.github.io/cgspace-notes/2020-09/
- <h2 id="2020-09-02">2020-09-02</h2>
-<ul>
-<li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li>
-<li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
-<ul>
-<li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li>
-</ul>
-</li>
-<li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li>
-<li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
-<ul>
-<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li>
-</ul>
-</li>
-<li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li>
-</ul>
+ <h2 id="2020-09-02">2020-09-02</h2>
<ul>
<li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li>
<li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
<ul>
<li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li>
</ul>
</li>
<li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li>
<li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
<ul>
<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li>
</ul>
</li>
<li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li>
</ul>August, 2020
https://alanorth.github.io/cgspace-notes/2020-08/
Sun, 02 Aug 2020 15:35:54 +0300https://alanorth.github.io/cgspace-notes/2020-08/
- <h2 id="2020-08-02">2020-08-02</h2>
-<ul>
-<li>I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their <code>cg.coverage.country</code> text values
-<ul>
-<li>It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)</li>
-<li>It implements a “force” mode too that will clear existing country codes and re-tag everything</li>
-<li>It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-08-02">2020-08-02</h2>
<ul>
<li>I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their <code>cg.coverage.country</code> text values
<ul>
<li>It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)</li>
<li>It implements a “force” mode too that will clear existing country codes and re-tag everything</li>
<li>It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…</li>
</ul>
</li>
</ul>July, 2020
https://alanorth.github.io/cgspace-notes/2020-07/
Wed, 01 Jul 2020 10:53:54 +0300https://alanorth.github.io/cgspace-notes/2020-07/
- <h2 id="2020-07-01">2020-07-01</h2>
-<ul>
-<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
-<ul>
-<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
-<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
-<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
-</ul>
-</li>
-<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
-</ul>
+ <h2 id="2020-07-01">2020-07-01</h2>
<ul>
<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
<ul>
<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
</ul>
</li>
<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
</ul>June, 2020
https://alanorth.github.io/cgspace-notes/2020-06/
Mon, 01 Jun 2020 13:55:39 +0300https://alanorth.github.io/cgspace-notes/2020-06/
- <h2 id="2020-06-01">2020-06-01</h2>
-<ul>
-<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
-<ul>
-<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
-</ul>
-</li>
-<li>In other news, I checked the statistics API on DSpace 6 and it’s working</li>
-<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
-</ul>
+ <h2 id="2020-06-01">2020-06-01</h2>
<ul>
<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
<ul>
<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
</ul>
</li>
<li>In other news, I checked the statistics API on DSpace 6 and it’s working</li>
<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
</ul>May, 2020
https://alanorth.github.io/cgspace-notes/2020-05/
Sat, 02 May 2020 09:52:04 +0300https://alanorth.github.io/cgspace-notes/2020-05/
- <h2 id="2020-05-02">2020-05-02</h2>
-<ul>
-<li>Peter said that CTA is having problems submitting an item to CGSpace
-<ul>
-<li>Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again</li>
-<li>I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-05-02">2020-05-02</h2>
<ul>
<li>Peter said that CTA is having problems submitting an item to CGSpace
<ul>
<li>Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again</li>
<li>I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)</li>
</ul>
</li>
</ul>April, 2020
https://alanorth.github.io/cgspace-notes/2020-04/
Thu, 02 Apr 2020 10:53:24 +0300https://alanorth.github.io/cgspace-notes/2020-04/
- <h2 id="2020-04-02">2020-04-02</h2>
-<ul>
-<li>Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
-<ul>
-<li>I updated the fifty-eight existing items on CGSpace</li>
-</ul>
-</li>
-<li>Looking into the items Udana had asked about last week that were missing Altmetric donuts:
-<ul>
-<li><a href="https://hdl.handle.net/10568/103225">The first</a> is still missing its DOI, so I added it and <a href="https://twitter.com/mralanorth/status/1245632619661766657">tweeted its handle</a> (after a few hours there was a donut with score 222)</li>
-<li><a href="https://hdl.handle.net/10568/106899">The second item</a> now has a donut with score 2 since I <a href="https://twitter.com/mralanorth/status/1243158045540134913">tweeted its handle</a> last week</li>
-<li><a href="https://hdl.handle.net/10568/107258">The third item</a> now has a donut with score 1 since I <a href="https://twitter.com/mralanorth/status/1243158786392625153">tweeted it</a> last week</li>
-</ul>
-</li>
-<li>On the same note, the <a href="https://hdl.handle.net/10568/106573">one item</a> Abenet pointed out last week now has a donut with score of 104 after I <a href="https://twitter.com/mralanorth/status/1243163710241345536">tweeted it</a> last week</li>
-</ul>
+ <h2 id="2020-04-02">2020-04-02</h2>
<ul>
<li>Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
<ul>
<li>I updated the fifty-eight existing items on CGSpace</li>
</ul>
</li>
<li>Looking into the items Udana had asked about last week that were missing Altmetric donuts:
<ul>
<li><a href="https://hdl.handle.net/10568/103225">The first</a> is still missing its DOI, so I added it and <a href="https://twitter.com/mralanorth/status/1245632619661766657">tweeted its handle</a> (after a few hours there was a donut with score 222)</li>
<li><a href="https://hdl.handle.net/10568/106899">The second item</a> now has a donut with score 2 since I <a href="https://twitter.com/mralanorth/status/1243158045540134913">tweeted its handle</a> last week</li>
<li><a href="https://hdl.handle.net/10568/107258">The third item</a> now has a donut with score 1 since I <a href="https://twitter.com/mralanorth/status/1243158786392625153">tweeted it</a> last week</li>
</ul>
</li>
<li>On the same note, the <a href="https://hdl.handle.net/10568/106573">one item</a> Abenet pointed out last week now has a donut with score of 104 after I <a href="https://twitter.com/mralanorth/status/1243163710241345536">tweeted it</a> last week</li>
</ul>March, 2020
https://alanorth.github.io/cgspace-notes/2020-03/
Mon, 02 Mar 2020 12:31:30 +0200https://alanorth.github.io/cgspace-notes/2020-03/
- <h2 id="2020-03-02">2020-03-02</h2>
-<ul>
-<li>Update <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
-<ul>
-<li>Tag version 1.2.0 on GitHub</li>
-</ul>
-</li>
-<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <a href="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
-<ul>
-<li>You need to download this into the DSpace 6.x source and compile it</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-03-02">2020-03-02</h2>
<ul>
<li>Update <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
<ul>
<li>Tag version 1.2.0 on GitHub</li>
</ul>
</li>
<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <a href="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
<ul>
<li>You need to download this into the DSpace 6.x source and compile it</li>
</ul>
</li>
</ul>February, 2020
https://alanorth.github.io/cgspace-notes/2020-02/
Sun, 02 Feb 2020 11:56:30 +0200https://alanorth.github.io/cgspace-notes/2020-02/
- <h2 id="2020-02-02">2020-02-02</h2>
-<ul>
-<li>Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
-<ul>
-<li>Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database</li>
-<li>I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks</li>
-<li>Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff</li>
-<li>The code finally builds and runs with a fresh install</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-02-02">2020-02-02</h2>
<ul>
<li>Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
<ul>
<li>Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database</li>
<li>I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks</li>
<li>Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff</li>
<li>The code finally builds and runs with a fresh install</li>
</ul>
</li>
</ul>January, 2020
https://alanorth.github.io/cgspace-notes/2020-01/
Mon, 06 Jan 2020 10:48:30 +0200https://alanorth.github.io/cgspace-notes/2020-01/
- <h2 id="2020-01-06">2020-01-06</h2>
-<ul>
-<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
-<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
-<ul>
-<li>The score is now linked to the DOI</li>
-<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
-<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
-</ul>
-</li>
-</ul>
-<h2 id="2020-01-07">2020-01-07</h2>
-<ul>
-<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
-<ul>
-<li>The DOI has a score of 259, but the Handle has no score at all</li>
-<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-01-06">2020-01-06</h2>
<ul>
<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
<ul>
<li>The score is now linked to the DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
</ul>
</li>
</ul>
<h2 id="2020-01-07">2020-01-07</h2>
<ul>
<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
<ul>
<li>The DOI has a score of 259, but the Handle has no score at all</li>
<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
</ul>
</li>
</ul>December, 2019
https://alanorth.github.io/cgspace-notes/2019-12/
Sun, 01 Dec 2019 11:22:30 +0200https://alanorth.github.io/cgspace-notes/2019-12/
- <h2 id="2019-12-01">2019-12-01</h2>
-<ul>
-<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
-<ul>
-<li>Check any packages that have residual configs and purge them:</li>
-<li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li>
-<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># apt update && apt full-upgrade
-# apt-get autoremove && apt-get autoclean
-# dpkg -C
-# reboot
-</code></pre>
+ <h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># apt update && apt full-upgrade
# apt-get autoremove && apt-get autoclean
# dpkg -C
# reboot
</code></pre>November, 2019
https://alanorth.github.io/cgspace-notes/2019-11/
Mon, 04 Nov 2019 12:20:30 +0200https://alanorth.github.io/cgspace-notes/2019-11/
- <h2 id="2019-11-04">2019-11-04</h2>
-<ul>
-<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
-<ul>
-<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-4671942
-# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-1277694
-</code></pre><ul>
-<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
-<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
-1183456
-# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
-106781
-</code></pre>
+ <h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
1277694
</code></pre><ul>
<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
106781
</code></pre>October, 2019
@@ -896,511 +384,168 @@
https://alanorth.github.io/cgspace-notes/2019-09/
Sun, 01 Sep 2019 10:17:51 +0300https://alanorth.github.io/cgspace-notes/2019-09/
- <h2 id="2019-09-01">2019-09-01</h2>
-<ul>
-<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
-<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 440 17.58.101.255
- 441 157.55.39.101
- 485 207.46.13.43
- 728 169.60.128.125
- 730 207.46.13.108
- 758 157.55.39.9
- 808 66.160.140.179
- 814 207.46.13.212
- 2472 163.172.71.23
- 6092 3.94.211.189
-# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 33 2a01:7e00::f03c:91ff:fe16:fcb
- 57 3.83.192.124
- 57 3.87.77.25
- 57 54.82.1.8
- 822 2a01:9cc0:47:1:1a:4:0:2
- 1223 45.5.184.72
- 1633 172.104.229.92
- 5112 205.186.128.185
- 7249 2a01:7e00::f03c:91ff:fe18:7396
- 9124 45.5.186.2
-</code></pre>
+ <h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>August, 2019
https://alanorth.github.io/cgspace-notes/2019-08/
Sat, 03 Aug 2019 12:39:51 +0300https://alanorth.github.io/cgspace-notes/2019-08/
- <h2 id="2019-08-03">2019-08-03</h2>
-<ul>
-<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
-</ul>
-<h2 id="2019-08-04">2019-08-04</h2>
-<ul>
-<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
-<li>Run system updates on CGSpace (linode18) and reboot it
-<ul>
-<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
-<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
-</ul>
-</li>
-<li>Run system updates on DSpace Test (linode19) and reboot it</li>
-</ul>
+ <h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
</ul>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
</ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>July, 2019
https://alanorth.github.io/cgspace-notes/2019-07/
Mon, 01 Jul 2019 12:13:51 +0300https://alanorth.github.io/cgspace-notes/2019-07/
- <h2 id="2019-07-01">2019-07-01</h2>
-<ul>
-<li>Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice</li>
-<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
-<ul>
-<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
-<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
-</ul>
-</li>
-<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
-</ul>
+ <h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>June, 2019
https://alanorth.github.io/cgspace-notes/2019-06/
Sun, 02 Jun 2019 10:57:51 +0300https://alanorth.github.io/cgspace-notes/2019-06/
- <h2 id="2019-06-02">2019-06-02</h2>
-<ul>
-<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
-<li>Run system updates on CGSpace (linode18) and reboot it</li>
-</ul>
-<h2 id="2019-06-03">2019-06-03</h2>
-<ul>
-<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
-</ul>
+ <h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>May, 2019
https://alanorth.github.io/cgspace-notes/2019-05/
Wed, 01 May 2019 07:37:43 +0300https://alanorth.github.io/cgspace-notes/2019-05/
- <h2 id="2019-05-01">2019-05-01</h2>
-<ul>
-<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
-<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
-<ul>
-<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
-<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
-</ul>
-</li>
-<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
-DELETE 1
-</code></pre><ul>
-<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
-</ul>
+ <h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul>
</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
</code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
</ul>April, 2019
https://alanorth.github.io/cgspace-notes/2019-04/
Mon, 01 Apr 2019 09:00:43 +0300https://alanorth.github.io/cgspace-notes/2019-04/
- <h2 id="2019-04-01">2019-04-01</h2>
-<ul>
-<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
-<ul>
-<li>They asked if we had plans to enable RDF support in CGSpace</li>
-</ul>
-</li>
-<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
-<ul>
-<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
- 4432 200
-</code></pre><ul>
-<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
-<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
-</ul>
-<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
-$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
-</code></pre>
+ <h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul>
</li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
</code></pre><ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre>March, 2019
https://alanorth.github.io/cgspace-notes/2019-03/
Fri, 01 Mar 2019 12:16:30 +0100https://alanorth.github.io/cgspace-notes/2019-03/
- <h2 id="2019-03-01">2019-03-01</h2>
-<ul>
-<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
-<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
-<li>Looking at the other half of Udana’s WLE records from 2018-11
-<ul>
-<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
-<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
-<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
-<li>68.15% � 9.45 instead of 68.15% ± 9.45</li>
-<li>2003�2013 instead of 2003–2013</li>
-</ul>
-</li>
-<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
-</ul>
+ <h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
<li>Looking at the other half of Udana’s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% � 9.45 instead of 68.15% ± 9.45</li>
<li>2003�2013 instead of 2003–2013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>February, 2019
https://alanorth.github.io/cgspace-notes/2019-02/
Fri, 01 Feb 2019 21:37:30 +0200https://alanorth.github.io/cgspace-notes/2019-02/
- <h2 id="2019-02-01">2019-02-01</h2>
-<ul>
-<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
-<li>The top IPs before, during, and after this latest alert tonight were:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 245 207.46.13.5
- 332 54.70.40.11
- 385 5.143.231.38
- 405 207.46.13.173
- 405 207.46.13.75
- 1117 66.249.66.219
- 1121 35.237.175.180
- 1546 5.9.6.51
- 2474 45.5.186.2
- 5490 85.25.237.71
-</code></pre><ul>
-<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
-<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
-<li>There were just over 3 million accesses in the nginx logs last month:</li>
-</ul>
-<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
-3018243
-
-real 0m19.873s
-user 0m22.203s
-sys 0m1.979s
-</code></pre>
+ <h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
</code></pre><ul>
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
</ul>
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
</code></pre>January, 2019
https://alanorth.github.io/cgspace-notes/2019-01/
Wed, 02 Jan 2019 09:48:30 +0200https://alanorth.github.io/cgspace-notes/2019-01/
- <h2 id="2019-01-02">2019-01-02</h2>
-<ul>
-<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
-<li>I don’t see anything interesting in the web server logs around that time though:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 92 40.77.167.4
- 99 210.7.29.100
- 120 38.126.157.45
- 177 35.237.175.180
- 177 40.77.167.32
- 216 66.249.75.219
- 225 18.203.76.93
- 261 46.101.86.248
- 357 207.46.13.1
- 903 54.70.40.11
-</code></pre>
+ <h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don’t see anything interesting in the web server logs around that time though:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>December, 2018
https://alanorth.github.io/cgspace-notes/2018-12/
Sun, 02 Dec 2018 02:09:30 +0200https://alanorth.github.io/cgspace-notes/2018-12/
- <h2 id="2018-12-01">2018-12-01</h2>
-<ul>
-<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
-<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
-<li>Then I ran all system updates and restarted the server</li>
-</ul>
-<h2 id="2018-12-02">2018-12-02</h2>
-<ul>
-<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
-</ul>
+ <h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>November, 2018
https://alanorth.github.io/cgspace-notes/2018-11/
Thu, 01 Nov 2018 16:41:30 +0200https://alanorth.github.io/cgspace-notes/2018-11/
- <h2 id="2018-11-01">2018-11-01</h2>
-<ul>
-<li>Finalize AReS Phase I and Phase II ToRs</li>
-<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
-</ul>
-<h2 id="2018-11-03">2018-11-03</h2>
-<ul>
-<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
-<li>Today these are the top 10 IPs:</li>
-</ul>
+ <h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>October, 2018
https://alanorth.github.io/cgspace-notes/2018-10/
Mon, 01 Oct 2018 22:31:54 +0300https://alanorth.github.io/cgspace-notes/2018-10/
- <h2 id="2018-10-01">2018-10-01</h2>
-<ul>
-<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
-<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
-</ul>
+ <h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
</ul>September, 2018
https://alanorth.github.io/cgspace-notes/2018-09/
Sun, 02 Sep 2018 09:55:54 +0300https://alanorth.github.io/cgspace-notes/2018-09/
- <h2 id="2018-09-02">2018-09-02</h2>
-<ul>
-<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
-<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
-<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
-<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
-</ul>
+ <h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul>August, 2018
https://alanorth.github.io/cgspace-notes/2018-08/
Wed, 01 Aug 2018 11:52:54 +0300https://alanorth.github.io/cgspace-notes/2018-08/
- <h2 id="2018-08-01">2018-08-01</h2>
-<ul>
-<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
-</ul>
-<pre tabindex="0"><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
-[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
-[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
-</code></pre><ul>
-<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
-<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
-<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
-<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
-<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
-<li>I ran all system updates on DSpace Test and rebooted it</li>
-</ul>
+ <h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre tabindex="0"><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
<li>I ran all system updates on DSpace Test and rebooted it</li>
</ul>July, 2018
https://alanorth.github.io/cgspace-notes/2018-07/
Sun, 01 Jul 2018 12:56:54 +0300https://alanorth.github.io/cgspace-notes/2018-07/
- <h2 id="2018-07-01">2018-07-01</h2>
-<ul>
-<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
-</ul>
-<pre tabindex="0"><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
-</code></pre><ul>
-<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
-</ul>
-<pre tabindex="0"><code>There is insufficient memory for the Java Runtime Environment to continue.
-</code></pre>
+ <h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
<pre tabindex="0"><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre tabindex="0"><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>June, 2018
https://alanorth.github.io/cgspace-notes/2018-06/
Mon, 04 Jun 2018 19:49:54 -0700https://alanorth.github.io/cgspace-notes/2018-06/
- <h2 id="2018-06-04">2018-06-04</h2>
-<ul>
-<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
-<ul>
-<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
-</ul>
-</li>
-<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
-<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
-</ul>
-<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
-</code></pre><ul>
-<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
-<li>Time to index ~70,000 items on CGSpace:</li>
-</ul>
-<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
-
-real 74m42.646s
-user 8m5.056s
-sys 2m7.289s
-</code></pre>
+ <h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
</ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
<li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s
user 8m5.056s
sys 2m7.289s
</code></pre>May, 2018
https://alanorth.github.io/cgspace-notes/2018-05/
Tue, 01 May 2018 16:43:54 +0300https://alanorth.github.io/cgspace-notes/2018-05/
- <h2 id="2018-05-01">2018-05-01</h2>
-<ul>
-<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
-<ul>
-<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
-<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
-</ul>
-</li>
-<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
-<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
-</ul>
+ <h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>April, 2018
https://alanorth.github.io/cgspace-notes/2018-04/
Sun, 01 Apr 2018 16:13:54 +0200https://alanorth.github.io/cgspace-notes/2018-04/
- <h2 id="2018-04-01">2018-04-01</h2>
-<ul>
-<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
-<li>Catalina logs at least show some memory errors yesterday:</li>
-</ul>
+ <h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
</ul>March, 2018
https://alanorth.github.io/cgspace-notes/2018-03/
Fri, 02 Mar 2018 16:07:54 +0200https://alanorth.github.io/cgspace-notes/2018-03/
- <h2 id="2018-03-02">2018-03-02</h2>
-<ul>
-<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
-</ul>
+ <h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>February, 2018
https://alanorth.github.io/cgspace-notes/2018-02/
Thu, 01 Feb 2018 16:28:54 +0200https://alanorth.github.io/cgspace-notes/2018-02/
- <h2 id="2018-02-01">2018-02-01</h2>
-<ul>
-<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
-<li>We don’t need to distinguish between internal and external works, so that makes it just a simple list</li>
-<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
-<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu’s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li>
-</ul>
+ <h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don’t need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu’s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul>January, 2018
https://alanorth.github.io/cgspace-notes/2018-01/
Tue, 02 Jan 2018 08:35:54 -0800https://alanorth.github.io/cgspace-notes/2018-01/
- <h2 id="2018-01-02">2018-01-02</h2>
-<ul>
-<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
-<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
-<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
-<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
-<li>And just before that I see this:</li>
-</ul>
-<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-</code></pre><ul>
-<li>Ah hah! So the pool was actually empty!</li>
-<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
-<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
-<li>I notice this error quite a few times in dspace.log:</li>
-</ul>
-<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-</code></pre><ul>
-<li>And there are many of these errors every day for the past month:</li>
-</ul>
-<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-</code></pre><ul>
-<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
-</ul>
+ <h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
<li>And just before that I see this:</li>
</ul>
<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
<li>I notice this error quite a few times in dspace.log:</li>
</ul>
<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
</code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
</ul>
<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
dspace.log.2017-11-24:11
dspace.log.2017-11-25:0
dspace.log.2017-11-26:1
dspace.log.2017-11-27:7
dspace.log.2017-11-28:21
dspace.log.2017-11-29:31
dspace.log.2017-11-30:15
dspace.log.2017-12-01:15
dspace.log.2017-12-02:20
dspace.log.2017-12-03:38
dspace.log.2017-12-04:65
dspace.log.2017-12-05:43
dspace.log.2017-12-06:72
dspace.log.2017-12-07:27
dspace.log.2017-12-08:15
dspace.log.2017-12-09:29
dspace.log.2017-12-10:35
dspace.log.2017-12-11:20
dspace.log.2017-12-12:44
dspace.log.2017-12-13:36
dspace.log.2017-12-14:59
dspace.log.2017-12-15:104
dspace.log.2017-12-16:53
dspace.log.2017-12-17:66
dspace.log.2017-12-18:83
dspace.log.2017-12-19:101
dspace.log.2017-12-20:74
dspace.log.2017-12-21:55
dspace.log.2017-12-22:66
dspace.log.2017-12-23:50
dspace.log.2017-12-24:85
dspace.log.2017-12-25:62
dspace.log.2017-12-26:49
dspace.log.2017-12-27:30
dspace.log.2017-12-28:54
dspace.log.2017-12-29:68
dspace.log.2017-12-30:89
dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
</ul>December, 2017
https://alanorth.github.io/cgspace-notes/2017-12/
Fri, 01 Dec 2017 13:53:54 +0300https://alanorth.github.io/cgspace-notes/2017-12/
- <h2 id="2017-12-01">2017-12-01</h2>
-<ul>
-<li>Uptime Robot noticed that CGSpace went down</li>
-<li>The logs say “Timeout waiting for idle object”</li>
-<li>PostgreSQL activity says there are 115 connections currently</li>
-<li>The list of connections to XMLUI and REST API for today:</li>
-</ul>
+ <h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say “Timeout waiting for idle object”</li>
<li>PostgreSQL activity says there are 115 connections currently</li>
<li>The list of connections to XMLUI and REST API for today:</li>
</ul>November, 2017
https://alanorth.github.io/cgspace-notes/2017-11/
Thu, 02 Nov 2017 09:37:54 +0200https://alanorth.github.io/cgspace-notes/2017-11/
- <h2 id="2017-11-01">2017-11-01</h2>
-<ul>
-<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
-</ul>
-<h2 id="2017-11-02">2017-11-02</h2>
-<ul>
-<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
-</ul>
-<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
-0
-</code></pre><ul>
-<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-</code></pre>
+ <h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
0
</code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
</ul>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
</code></pre>October, 2017
https://alanorth.github.io/cgspace-notes/2017-10/
Sun, 01 Oct 2017 08:07:54 +0300https://alanorth.github.io/cgspace-notes/2017-10/
- <h2 id="2017-10-01">2017-10-01</h2>
-<ul>
-<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
-</ul>
-<pre tabindex="0"><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-</code></pre><ul>
-<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
-<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
-</ul>
+ <h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre tabindex="0"><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre><ul>
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
</ul>CGIAR Library Migration
@@ -1414,58 +559,21 @@ COPY 54701
https://alanorth.github.io/cgspace-notes/2017-09/
Thu, 07 Sep 2017 16:54:52 +0700https://alanorth.github.io/cgspace-notes/2017-09/
- <h2 id="2017-09-06">2017-09-06</h2>
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
-</ul>
-<h2 id="2017-09-07">2017-09-07</h2>
-<ul>
-<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group</li>
-</ul>
+ <h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group</li>
</ul>August, 2017
https://alanorth.github.io/cgspace-notes/2017-08/
Tue, 01 Aug 2017 11:51:52 +0300https://alanorth.github.io/cgspace-notes/2017-08/
- <h2 id="2017-08-01">2017-08-01</h2>
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
-<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
-<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
-<li>This means our Tomcat Crawler Session Valve is working</li>
-<li>But many of the bots are browsing dynamic URLs like:
-<ul>
-<li>/handle/10568/3353/discover</li>
-<li>/handle/10568/16510/browse</li>
-</ul>
-</li>
-<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
-<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
-<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
-<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
-<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
-<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
-<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
-<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
-<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
-</ul>
+ <h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>July, 2017
https://alanorth.github.io/cgspace-notes/2017-07/
Sat, 01 Jul 2017 18:03:52 +0300https://alanorth.github.io/cgspace-notes/2017-07/
- <h2 id="2017-07-01">2017-07-01</h2>
-<ul>
-<li>Run system updates and reboot DSpace Test</li>
-</ul>
-<h2 id="2017-07-04">2017-07-04</h2>
-<ul>
-<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
-<li>Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace</li>
-<li>We can use PostgreSQL’s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
-</ul>
+ <h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL’s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul>June, 2017
@@ -1486,299 +594,126 @@ COPY 54701
https://alanorth.github.io/cgspace-notes/2017-04/
Sun, 02 Apr 2017 17:08:52 +0200https://alanorth.github.io/cgspace-notes/2017-04/
- <h2 id="2017-04-02">2017-04-02</h2>
-<ul>
-<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
-<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
-<ul>
-<li>Remove redundant/duplicate text in the DSpace submission license</li>
-<li>Testing the CMYK patch on a collection with 650 items:</li>
-</ul>
-<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-</code></pre>
+ <h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
</ul>
<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
</code></pre>March, 2017
https://alanorth.github.io/cgspace-notes/2017-03/
Wed, 01 Mar 2017 17:08:52 +0200https://alanorth.github.io/cgspace-notes/2017-03/
- <h2 id="2017-03-01">2017-03-01</h2>
-<ul>
-<li>Run the 279 CIAT author corrections on CGSpace</li>
-</ul>
-<h2 id="2017-03-02">2017-03-02</h2>
-<ul>
-<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
-<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
-<li>They might come in at the top level in one “CGIAR System” community, or with several communities</li>
-<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
-<li>Need to send Peter and Michael some notes about this in a few days</li>
-<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
-<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
-<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
-<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
-</ul>
-<pre tabindex="0"><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-</code></pre>
+ <h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
<li>They might come in at the top level in one “CGIAR System” community, or with several communities</li>
<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
<li>Need to send Peter and Michael some notes about this in a few days</li>
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
</ul>
<pre tabindex="0"><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre>February, 2017
https://alanorth.github.io/cgspace-notes/2017-02/
Tue, 07 Feb 2017 07:04:52 -0800https://alanorth.github.io/cgspace-notes/2017-02/
- <h2 id="2017-02-07">2017-02-07</h2>
-<ul>
-<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
- id | collection_id | item_id
--------+---------------+---------
- 92551 | 313 | 80278
- 92550 | 313 | 80278
- 90774 | 1051 | 80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-</code></pre><ul>
-<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
-<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
-</ul>
+ <h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
</code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
</ul>January, 2017
https://alanorth.github.io/cgspace-notes/2017-01/
Mon, 02 Jan 2017 10:43:00 +0300https://alanorth.github.io/cgspace-notes/2017-01/
- <h2 id="2017-01-02">2017-01-02</h2>
-<ul>
-<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
-<li>I tested on DSpace Test as well and it doesn’t work there either</li>
-<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
-</ul>
+ <h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn’t work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
</ul>December, 2016
https://alanorth.github.io/cgspace-notes/2016-12/
Fri, 02 Dec 2016 10:43:00 +0300https://alanorth.github.io/cgspace-notes/2016-12/
- <h2 id="2016-12-02">2016-12-02</h2>
-<ul>
-<li>CGSpace was down for five hours in the morning while I was sleeping</li>
-<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
-</ul>
-<pre tabindex="0"><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-</code></pre><ul>
-<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
-<li>I’ve raised a ticket with Atmire to ask</li>
-<li>Another worrying error from dspace.log is:</li>
-</ul>
+ <h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
</ul>
<pre tabindex="0"><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
</code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
<li>I’ve raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
</ul>November, 2016
https://alanorth.github.io/cgspace-notes/2016-11/
Tue, 01 Nov 2016 09:21:00 +0300https://alanorth.github.io/cgspace-notes/2016-11/
- <h2 id="2016-11-01">2016-11-01</h2>
-<ul>
-<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
+ <h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>October, 2016
https://alanorth.github.io/cgspace-notes/2016-10/
Mon, 03 Oct 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-10/
- <h2 id="2016-10-03">2016-10-03</h2>
-<ul>
-<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
-<li>Need to test the following scenarios to see how author order is affected:
-<ul>
-<li>ORCIDs only</li>
-<li>ORCIDs plus normal authors</li>
-</ul>
-</li>
-<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
-</ul>
-<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-</code></pre>
+ <h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>September, 2016
https://alanorth.github.io/cgspace-notes/2016-09/
Thu, 01 Sep 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-09/
- <h2 id="2016-09-01">2016-09-01</h2>
-<ul>
-<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
-<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
-<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
-<li>It looks like we might be able to use OUs now, instead of DCs:</li>
-</ul>
-<pre tabindex="0"><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-</code></pre>
+ <h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
</ul>
<pre tabindex="0"><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
</code></pre>August, 2016
https://alanorth.github.io/cgspace-notes/2016-08/
Mon, 01 Aug 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-08/
- <h2 id="2016-08-01">2016-08-01</h2>
-<ul>
-<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
-<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
-<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
-<li>bower stuff is a dead end, waste of time, too many issues</li>
-<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
-<li>Start working on DSpace 5.1 → 5.5 port:</li>
-</ul>
-<pre tabindex="0"><code>$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-</code></pre>
+ <h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.1 → 5.5 port:</li>
</ul>
<pre tabindex="0"><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
</code></pre>July, 2016
https://alanorth.github.io/cgspace-notes/2016-07/
Fri, 01 Jul 2016 10:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-07/
- <h2 id="2016-07-01">2016-07-01</h2>
-<ul>
-<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
-<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
-</ul>
-<pre tabindex="0"><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-</code></pre><ul>
-<li>In this case the select query was showing 95 results before the update</li>
-</ul>
+ <h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
</ul>
<pre tabindex="0"><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value
------------
(0 rows)
</code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
</ul>June, 2016
https://alanorth.github.io/cgspace-notes/2016-06/
Wed, 01 Jun 2016 10:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-06/
- <h2 id="2016-06-01">2016-06-01</h2>
-<ul>
-<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
-<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI’s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
-<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
-<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc</a></li>
-<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
-<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
-</ul>
+ <h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI’s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
</ul>May, 2016
https://alanorth.github.io/cgspace-notes/2016-05/
Sun, 01 May 2016 23:06:00 +0300https://alanorth.github.io/cgspace-notes/2016-05/
- <h2 id="2016-05-01">2016-05-01</h2>
-<ul>
-<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
-<li>I have blocked access to the API now</li>
-<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
-</ul>
-<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
-3168
-</code></pre>
+ <h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
</ul>
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
</code></pre>April, 2016
https://alanorth.github.io/cgspace-notes/2016-04/
Mon, 04 Apr 2016 11:06:00 +0300https://alanorth.github.io/cgspace-notes/2016-04/
- <h2 id="2016-04-04">2016-04-04</h2>
-<ul>
-<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
-<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
-<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
-<li>This will save us a few gigs of backup space we’re paying for on S3</li>
-<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
-</ul>
+ <h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we’re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>March, 2016
https://alanorth.github.io/cgspace-notes/2016-03/
Wed, 02 Mar 2016 16:50:00 +0300https://alanorth.github.io/cgspace-notes/2016-03/
- <h2 id="2016-03-02">2016-03-02</h2>
-<ul>
-<li>Looking at issues with author authorities on CGSpace</li>
-<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
-<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
-</ul>
+ <h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul>February, 2016
https://alanorth.github.io/cgspace-notes/2016-02/
Fri, 05 Feb 2016 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2016-02/
- <h2 id="2016-02-05">2016-02-05</h2>
-<ul>
-<li>Looking at some DAGRIS data for Abenet Yabowork</li>
-<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
-<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
-<ul>
-<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
-<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
-</ul>
+ <h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<ul>
<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
</ul>January, 2016
https://alanorth.github.io/cgspace-notes/2016-01/
Wed, 13 Jan 2016 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2016-01/
- <h2 id="2016-01-13">2016-01-13</h2>
-<ul>
-<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
-<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
-<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
-</ul>
+ <h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
</ul>December, 2015
https://alanorth.github.io/cgspace-notes/2015-12/
Wed, 02 Dec 2015 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2015-12/
- <h2 id="2015-12-02">2015-12-02</h2>
-<ul>
-<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
-</ul>
-<pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-</code></pre>
+ <h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>November, 2015
https://alanorth.github.io/cgspace-notes/2015-11/
Mon, 23 Nov 2015 17:00:57 +0300https://alanorth.github.io/cgspace-notes/2015-11/
- <h2 id="2015-11-22">2015-11-22</h2>
-<ul>
-<li>CGSpace went down</li>
-<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
-<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
-</ul>
-<pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-</code></pre>
+ <h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
</ul>
<pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
</code></pre>
diff --git a/docs/page/10/index.html b/docs/page/10/index.html
index 2cc7413b5..65d44903d 100644
--- a/docs/page/10/index.html
+++ b/docs/page/10/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/11/index.html b/docs/page/11/index.html
index 9ecebafe3..85f399772 100644
--- a/docs/page/11/index.html
+++ b/docs/page/11/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index f43f193d9..916fade2d 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 3cd5b7656..8337ecf21 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 6ebf1b949..766dd1e68 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 671141e5d..a5ece1eb1 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index a71032b2d..479a0187c 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 267341711..db4073663 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index cc0ee2d33..5532f94cf 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index e937bf047..366916fef 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 75e83ec3e..0b2a1f156 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/index.xml b/docs/posts/index.xml
index 19368ec93..cde31bae5 100644
--- a/docs/posts/index.xml
+++ b/docs/posts/index.xml
@@ -20,61 +20,28 @@
https://alanorth.github.io/cgspace-notes/2023-11/
Thu, 02 Nov 2023 12:59:36 +0300https://alanorth.github.io/cgspace-notes/2023-11/
- <h2 id="2023-11-01">2023-11-01</h2>
-<ul>
-<li>Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
-<ul>
-<li>I improved the filtering and wrote some Python using pandas to merge my sources more reliably</li>
-</ul>
-</li>
-</ul>
-<h2 id="2023-11-02">2023-11-02</h2>
-<ul>
-<li>Export CGSpace to check missing Initiative collection mappings</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-11-01">2023-11-01</h2>
<ul>
<li>Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
<ul>
<li>I improved the filtering and wrote some Python using pandas to merge my sources more reliably</li>
</ul>
</li>
</ul>
<h2 id="2023-11-02">2023-11-02</h2>
<ul>
<li>Export CGSpace to check missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>October, 2023
https://alanorth.github.io/cgspace-notes/2023-10/
Mon, 02 Oct 2023 09:05:36 +0300https://alanorth.github.io/cgspace-notes/2023-10/
- <h2 id="2023-10-02">2023-10-02</h2>
-<ul>
-<li>Export CGSpace to check DOIs against Crossref
-<ul>
-<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref’s metadata is in the public domain under the CC0 license</a></li>
-<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
-<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-10-02">2023-10-02</h2>
<ul>
<li>Export CGSpace to check DOIs against Crossref
<ul>
<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref’s metadata is in the public domain under the CC0 license</a></li>
<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
</ul>
</li>
</ul>September, 2023
https://alanorth.github.io/cgspace-notes/2023-09/
Sat, 02 Sep 2023 17:29:36 +0300https://alanorth.github.io/cgspace-notes/2023-09/
- <h2 id="2023-09-02">2023-09-02</h2>
-<ul>
-<li>Export CGSpace to check for missing Initiative collection mappings</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-09-02">2023-09-02</h2>
<ul>
<li>Export CGSpace to check for missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>August, 2023
https://alanorth.github.io/cgspace-notes/2023-08/
Thu, 03 Aug 2023 11:18:36 +0300https://alanorth.github.io/cgspace-notes/2023-08/
- <h2 id="2023-08-03">2023-08-03</h2>
-<ul>
-<li>I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
-<ul>
-<li>I did some minor cleanups myself and applied them to CGSpace</li>
-</ul>
-</li>
-<li>Start working on some batch uploads for IFPRI</li>
-</ul>
+ <h2 id="2023-08-03">2023-08-03</h2>
<ul>
<li>I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
<ul>
<li>I did some minor cleanups myself and applied them to CGSpace</li>
</ul>
</li>
<li>Start working on some batch uploads for IFPRI</li>
</ul>July, 2023
@@ -88,249 +55,98 @@
https://alanorth.github.io/cgspace-notes/2023-06/
Fri, 02 Jun 2023 10:29:36 +0300https://alanorth.github.io/cgspace-notes/2023-06/
- <h2 id="2023-06-02">2023-06-02</h2>
-<ul>
-<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
-<ul>
-<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
-</ul>
-</li>
-<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
-<ul>
-<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
-<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-06-02">2023-06-02</h2>
<ul>
<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
<ul>
<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
</ul>
</li>
<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
<ul>
<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
</ul>
</li>
</ul>May, 2023
https://alanorth.github.io/cgspace-notes/2023-05/
Wed, 03 May 2023 08:53:36 +0300https://alanorth.github.io/cgspace-notes/2023-05/
- <h2 id="2023-05-03">2023-05-03</h2>
-<ul>
-<li>Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
-<ul>
-<li>It seems their password expired, which is annoying</li>
-</ul>
-</li>
-<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
-<ul>
-<li>There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…</li>
-<li>Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly</li>
-</ul>
-</li>
-<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
-</ul>
+ <h2 id="2023-05-03">2023-05-03</h2>
<ul>
<li>Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
<ul>
<li>It seems their password expired, which is annoying</li>
</ul>
</li>
<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
<ul>
<li>There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…</li>
<li>Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly</li>
</ul>
</li>
<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
</ul>April, 2023
https://alanorth.github.io/cgspace-notes/2023-04/
Sun, 02 Apr 2023 08:19:36 +0300https://alanorth.github.io/cgspace-notes/2023-04/
- <h2 id="2023-04-02">2023-04-02</h2>
-<ul>
-<li>Run all system updates on CGSpace and reboot it</li>
-<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
-<ul>
-<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
-</ul>
-</li>
-<li>Start a harvest on AReS</li>
-</ul>
+ <h2 id="2023-04-02">2023-04-02</h2>
<ul>
<li>Run all system updates on CGSpace and reboot it</li>
<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
<ul>
<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>March, 2023
https://alanorth.github.io/cgspace-notes/2023-03/
Wed, 01 Mar 2023 07:58:36 +0300https://alanorth.github.io/cgspace-notes/2023-03/
- <h2 id="2023-03-01">2023-03-01</h2>
-<ul>
-<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
-<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
-<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
-</ul>
+ <h2 id="2023-03-01">2023-03-01</h2>
<ul>
<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
</ul>February, 2023
https://alanorth.github.io/cgspace-notes/2023-02/
Wed, 01 Feb 2023 10:57:36 +0300https://alanorth.github.io/cgspace-notes/2023-02/
- <h2 id="2023-02-01">2023-02-01</h2>
-<ul>
-<li>Export CGSpace to cross check the DOI metadata with Crossref
-<ul>
-<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-02-01">2023-02-01</h2>
<ul>
<li>Export CGSpace to cross check the DOI metadata with Crossref
<ul>
<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…</li>
</ul>
</li>
</ul>January, 2023
https://alanorth.github.io/cgspace-notes/2023-01/
Sun, 01 Jan 2023 08:44:36 +0300https://alanorth.github.io/cgspace-notes/2023-01/
- <h2 id="2023-01-01">2023-01-01</h2>
-<ul>
-<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
-<ul>
-<li>I want to update all ORCID names and refresh them in the database</li>
-<li>I see we have some new ones that aren’t in our list if I combine with this file:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2023-01-01">2023-01-01</h2>
<ul>
<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
<ul>
<li>I want to update all ORCID names and refresh them in the database</li>
<li>I see we have some new ones that aren’t in our list if I combine with this file:</li>
</ul>
</li>
</ul>December, 2022
https://alanorth.github.io/cgspace-notes/2022-12/
Thu, 01 Dec 2022 08:52:36 +0300https://alanorth.github.io/cgspace-notes/2022-12/
- <h2 id="2022-12-01">2022-12-01</h2>
-<ul>
-<li>Fix some incorrect regions on CGSpace
-<ul>
-<li>I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions</li>
-</ul>
-</li>
-<li>Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!</li>
-<li>Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)</li>
-</ul>
+ <h2 id="2022-12-01">2022-12-01</h2>
<ul>
<li>Fix some incorrect regions on CGSpace
<ul>
<li>I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions</li>
</ul>
</li>
<li>Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!</li>
<li>Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)</li>
</ul>November, 2022
https://alanorth.github.io/cgspace-notes/2022-11/
Tue, 01 Nov 2022 09:11:36 +0300https://alanorth.github.io/cgspace-notes/2022-11/
- <h2 id="2022-11-01">2022-11-01</h2>
-<ul>
-<li>Last night I re-synced DSpace 7 Test from CGSpace
-<ul>
-<li>I also updated all my local <code>7_x-dev</code> branches on the latest upstreams</li>
-</ul>
-</li>
-<li>I spent some time updating the authorizations in Alliance collections
-<ul>
-<li>I want to make sure they use groups instead of individuals where possible!</li>
-</ul>
-</li>
-<li>I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue</li>
-</ul>
+ <h2 id="2022-11-01">2022-11-01</h2>
<ul>
<li>Last night I re-synced DSpace 7 Test from CGSpace
<ul>
<li>I also updated all my local <code>7_x-dev</code> branches on the latest upstreams</li>
</ul>
</li>
<li>I spent some time updating the authorizations in Alliance collections
<ul>
<li>I want to make sure they use groups instead of individuals where possible!</li>
</ul>
</li>
<li>I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue</li>
</ul>October, 2022
https://alanorth.github.io/cgspace-notes/2022-10/
Sat, 01 Oct 2022 19:45:36 +0300https://alanorth.github.io/cgspace-notes/2022-10/
- <h2 id="2022-10-01">2022-10-01</h2>
-<ul>
-<li>Start a harvest on AReS last night</li>
-<li>Yesterday I realized how to use <a href="https://im4java.sourceforge.net/docs/dev-guide.html">GraphicsMagick with im4java</a> and I want to re-visit some of my thumbnail tests
-<ul>
-<li>I’m also interested in libvips support via jVips, though last time I checked it was only for Java 8</li>
-<li>I filed <a href="https://github.com/criteo/JVips/issues/141">an issue to ask about Java 11+ support</a></li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-10-01">2022-10-01</h2>
<ul>
<li>Start a harvest on AReS last night</li>
<li>Yesterday I realized how to use <a href="https://im4java.sourceforge.net/docs/dev-guide.html">GraphicsMagick with im4java</a> and I want to re-visit some of my thumbnail tests
<ul>
<li>I’m also interested in libvips support via jVips, though last time I checked it was only for Java 8</li>
<li>I filed <a href="https://github.com/criteo/JVips/issues/141">an issue to ask about Java 11+ support</a></li>
</ul>
</li>
</ul>September, 2022
https://alanorth.github.io/cgspace-notes/2022-09/
Thu, 01 Sep 2022 09:41:36 +0300https://alanorth.github.io/cgspace-notes/2022-09/
- <h2 id="2022-09-01">2022-09-01</h2>
-<ul>
-<li>A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet</li>
-<li>I tested an item submission on DSpace Test with the Cocoon <code>org.apache.cocoon.uploads.autosave=false</code> change
-<ul>
-<li>The submission works as expected</li>
-</ul>
-</li>
-<li>Start debugging some region-related issues with csv-metadata-quality
-<ul>
-<li>I created a new test file <code>test-geography.csv</code> with some different scenarios</li>
-<li>I also fixed a few bugs and improved the region-matching logic</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-09-01">2022-09-01</h2>
<ul>
<li>A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet</li>
<li>I tested an item submission on DSpace Test with the Cocoon <code>org.apache.cocoon.uploads.autosave=false</code> change
<ul>
<li>The submission works as expected</li>
</ul>
</li>
<li>Start debugging some region-related issues with csv-metadata-quality
<ul>
<li>I created a new test file <code>test-geography.csv</code> with some different scenarios</li>
<li>I also fixed a few bugs and improved the region-matching logic</li>
</ul>
</li>
</ul>August, 2022
https://alanorth.github.io/cgspace-notes/2022-08/
Mon, 01 Aug 2022 10:22:36 +0300https://alanorth.github.io/cgspace-notes/2022-08/
- <h2 id="2022-08-01">2022-08-01</h2>
-<ul>
-<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
-</ul>
+ <h2 id="2022-08-01">2022-08-01</h2>
<ul>
<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
</ul>July, 2022
https://alanorth.github.io/cgspace-notes/2022-07/
Sat, 02 Jul 2022 14:07:36 +0300https://alanorth.github.io/cgspace-notes/2022-07/
- <h2 id="2022-07-02">2022-07-02</h2>
-<ul>
-<li>I learned how to use the Levenshtein functions in PostgreSQL
-<ul>
-<li>The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing</li>
-<li>Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-07-02">2022-07-02</h2>
<ul>
<li>I learned how to use the Levenshtein functions in PostgreSQL
<ul>
<li>The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing</li>
<li>Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first</li>
</ul>
</li>
</ul>June, 2022
https://alanorth.github.io/cgspace-notes/2022-06/
Mon, 06 Jun 2022 09:01:36 +0300https://alanorth.github.io/cgspace-notes/2022-06/
- <h2 id="2022-06-06">2022-06-06</h2>
-<ul>
-<li>Look at the Solr statistics on CGSpace
-<ul>
-<li>I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query <code>dns:*msnbot* AND dns:*.msn.com</code></li>
-<li>I purged these first so I could see the other “real” IPs in the Solr facets</li>
-</ul>
-</li>
-<li>I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent</li>
-<li>I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent</li>
-<li>I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
-<ul>
-<li>There seem to be many more of these:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-06-06">2022-06-06</h2>
<ul>
<li>Look at the Solr statistics on CGSpace
<ul>
<li>I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query <code>dns:*msnbot* AND dns:*.msn.com</code></li>
<li>I purged these first so I could see the other “real” IPs in the Solr facets</li>
</ul>
</li>
<li>I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent</li>
<li>I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent</li>
<li>I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
<ul>
<li>There seem to be many more of these:</li>
</ul>
</li>
</ul>May, 2022
https://alanorth.github.io/cgspace-notes/2022-05/
Wed, 04 May 2022 09:13:39 +0300https://alanorth.github.io/cgspace-notes/2022-05/
- <h2 id="2022-05-04">2022-05-04</h2>
-<ul>
-<li>I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
-<ul>
-<li>18.207.136.176</li>
-<li>185.189.36.248</li>
-<li>50.118.223.78</li>
-<li>52.70.76.123</li>
-<li>3.236.10.11</li>
-</ul>
-</li>
-<li>Looking at the Solr statistics for 2022-04
-<ul>
-<li>52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests</li>
-<li>64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc</li>
-<li>185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt</li>
-<li>157.55.39.159 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
-<li>52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
-<li>157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
-<li>207.46.13.177 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
-<li>If I query Solr for <code>time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.</code> I see a handful of IPs that made 41,000 requests</li>
-</ul>
-</li>
-<li>I purged 93,974 hits from these IPs using my <code>check-spider-ip-hits.sh</code> script</li>
-</ul>
+ <h2 id="2022-05-04">2022-05-04</h2>
<ul>
<li>I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
<ul>
<li>18.207.136.176</li>
<li>185.189.36.248</li>
<li>50.118.223.78</li>
<li>52.70.76.123</li>
<li>3.236.10.11</li>
</ul>
</li>
<li>Looking at the Solr statistics for 2022-04
<ul>
<li>52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests</li>
<li>64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc</li>
<li>185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt</li>
<li>157.55.39.159 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
<li>52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
<li>157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests</li>
<li>207.46.13.177 is owned by Microsoft and identifies as bingbot so I don’t know why its requests were logged in Solr</li>
<li>If I query Solr for <code>time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.</code> I see a handful of IPs that made 41,000 requests</li>
</ul>
</li>
<li>I purged 93,974 hits from these IPs using my <code>check-spider-ip-hits.sh</code> script</li>
</ul>April, 2022
@@ -344,286 +160,119 @@
https://alanorth.github.io/cgspace-notes/2022-03/
Tue, 01 Mar 2022 16:46:54 +0300https://alanorth.github.io/cgspace-notes/2022-03/
- <h2 id="2022-03-01">2022-03-01</h2>
-<ul>
-<li>Send Gaia the last batch of potential duplicates for items 701 to 980:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4.csv
-</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -o /tmp/2022-03-01-tac-batch4-701-980.csv
-</span></span><span style="display:flex;"><span>$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
-</span></span><span style="display:flex;"><span>$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
-</span></span></code></pre></div>
+ <h2 id="2022-03-01">2022-03-01</h2>
<ul>
<li>Send Gaia the last batch of potential duplicates for items 701 to 980:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4.csv
</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/tac4.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -o /tmp/2022-03-01-tac-batch4-701-980.csv
</span></span><span style="display:flex;"><span>$ csvcut -c id,filename ~/Downloads/2022-03-01-CGSpace-TAC-ICW-batch4-701-980.csv > /tmp/tac4-filenames.csv
</span></span><span style="display:flex;"><span>$ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv > /tmp/2022-03-01-tac-batch4-701-980-filenames.csv
</span></span></code></pre></div>February, 2022
https://alanorth.github.io/cgspace-notes/2022-02/
Tue, 01 Feb 2022 14:06:54 +0200https://alanorth.github.io/cgspace-notes/2022-02/
- <h2 id="2022-02-01">2022-02-01</h2>
-<ul>
-<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
-<ul>
-<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
-<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
-<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
-<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2022-02-01">2022-02-01</h2>
<ul>
<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
<ul>
<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
</ul>
</li>
</ul>January, 2022
https://alanorth.github.io/cgspace-notes/2022-01/
Sat, 01 Jan 2022 15:20:54 +0200https://alanorth.github.io/cgspace-notes/2022-01/
- <h2 id="2022-01-01">2022-01-01</h2>
-<ul>
-<li>Start a full harvest on AReS</li>
-</ul>
+ <h2 id="2022-01-01">2022-01-01</h2>
<ul>
<li>Start a full harvest on AReS</li>
</ul>December, 2021
https://alanorth.github.io/cgspace-notes/2021-12/
Wed, 01 Dec 2021 16:07:07 +0200https://alanorth.github.io/cgspace-notes/2021-12/
- <h2 id="2021-12-01">2021-12-01</h2>
-<ul>
-<li>Atmire merged some changes I had submitted to the COUNTER-Robots project</li>
-<li>I updated our local spider user agents and then re-ran the list with my <code>check-spider-hits.sh</code> script on CGSpace:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
-</span></span><span style="display:flex;"><span>Purging 1989 hits from The Knowledge AI in statistics
-</span></span><span style="display:flex;"><span>Purging 1235 hits from MaCoCu in statistics
-</span></span><span style="display:flex;"><span>Purging 455 hits from WhatsApp in statistics
-</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
-</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 3679
-</span></span></code></pre></div>
+ <h2 id="2021-12-01">2021-12-01</h2>
<ul>
<li>Atmire merged some changes I had submitted to the COUNTER-Robots project</li>
<li>I updated our local spider user agents and then re-ran the list with my <code>check-spider-hits.sh</code> script on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
</span></span><span style="display:flex;"><span>Purging 1989 hits from The Knowledge AI in statistics
</span></span><span style="display:flex;"><span>Purging 1235 hits from MaCoCu in statistics
</span></span><span style="display:flex;"><span>Purging 455 hits from WhatsApp in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 3679
</span></span></code></pre></div>November, 2021
https://alanorth.github.io/cgspace-notes/2021-11/
Tue, 02 Nov 2021 22:27:07 +0200https://alanorth.github.io/cgspace-notes/2021-11/
- <h2 id="2021-11-02">2021-11-02</h2>
-<ul>
-<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
-<li>First I exported all the 2019 stats from CGSpace:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
-</span></span><span style="display:flex;"><span>$ zstd statistics-2019.json
-</span></span></code></pre></div>
+ <h2 id="2021-11-02">2021-11-02</h2>
<ul>
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
<li>First I exported all the 2019 stats from CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
</span></span><span style="display:flex;"><span>$ zstd statistics-2019.json
</span></span></code></pre></div>October, 2021
https://alanorth.github.io/cgspace-notes/2021-10/
Fri, 01 Oct 2021 11:14:07 +0300https://alanorth.github.io/cgspace-notes/2021-10/
- <h2 id="2021-10-01">2021-10-01</h2>
-<ul>
-<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
-</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
-</span></span><span style="display:flex;"><span>$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
-</span></span><span style="display:flex;"><span>ations-matching.csv
-</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
-</span></span><span style="display:flex;"><span>1879
-</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-10-01-affiliations.txt
-</span></span><span style="display:flex;"><span>7100 /tmp/2021-10-01-affiliations.txt
-</span></span></code></pre></div><ul>
-<li>So we have 1879/7100 (26.46%) matching already</li>
-</ul>
+ <h2 id="2021-10-01">2021-10-01</h2>
<ul>
<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
</span></span><span style="display:flex;"><span>$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
</span></span><span style="display:flex;"><span>ations-matching.csv
</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
</span></span><span style="display:flex;"><span>1879
</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-10-01-affiliations.txt
</span></span><span style="display:flex;"><span>7100 /tmp/2021-10-01-affiliations.txt
</span></span></code></pre></div><ul>
<li>So we have 1879/7100 (26.46%) matching already</li>
</ul>September, 2021
https://alanorth.github.io/cgspace-notes/2021-09/
Wed, 01 Sep 2021 09:14:07 +0300https://alanorth.github.io/cgspace-notes/2021-09/
- <h2 id="2021-09-02">2021-09-02</h2>
-<ul>
-<li>Troubleshooting the missing Altmetric scores on AReS
-<ul>
-<li>Turns out that I didn’t actually fix them last month because the check for <code>content.altmetric</code> still exists, and I can’t access the DOIs using <code>_h.source.DOI</code> for some reason</li>
-<li>I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!</li>
-<li>I will change <code>DOI</code> to <code>tomato</code> in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…</li>
-<li>Even as <code>tomato</code> I can’t access that field as <code>_h.source.tomato</code> in Angular, but it does work as a filter source… sigh</li>
-</ul>
-</li>
-<li>I’m having problems using the OpenRXV API
-<ul>
-<li>The syntax Moayad showed me last month doesn’t seem to honor the search query properly…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-09-02">2021-09-02</h2>
<ul>
<li>Troubleshooting the missing Altmetric scores on AReS
<ul>
<li>Turns out that I didn’t actually fix them last month because the check for <code>content.altmetric</code> still exists, and I can’t access the DOIs using <code>_h.source.DOI</code> for some reason</li>
<li>I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!</li>
<li>I will change <code>DOI</code> to <code>tomato</code> in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…</li>
<li>Even as <code>tomato</code> I can’t access that field as <code>_h.source.tomato</code> in Angular, but it does work as a filter source… sigh</li>
</ul>
</li>
<li>I’m having problems using the OpenRXV API
<ul>
<li>The syntax Moayad showed me last month doesn’t seem to honor the search query properly…</li>
</ul>
</li>
</ul>August, 2021
https://alanorth.github.io/cgspace-notes/2021-08/
Sun, 01 Aug 2021 09:01:07 +0300https://alanorth.github.io/cgspace-notes/2021-08/
- <h2 id="2021-08-01">2021-08-01</h2>
-<ul>
-<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
-</span></span></code></pre></div><ul>
-<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
-</ul>
+ <h2 id="2021-08-01">2021-08-01</h2>
<ul>
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
</span></span></code></pre></div><ul>
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
</ul>July, 2021
https://alanorth.github.io/cgspace-notes/2021-07/
Thu, 01 Jul 2021 08:53:07 +0300https://alanorth.github.io/cgspace-notes/2021-07/
- <h2 id="2021-07-01">2021-07-01</h2>
-<ul>
-<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
-</span></span><span style="display:flex;"><span>COPY 20994
-</span></span></code></pre></div>
+ <h2 id="2021-07-01">2021-07-01</h2>
<ul>
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 20994
</span></span></code></pre></div>June, 2021
https://alanorth.github.io/cgspace-notes/2021-06/
Tue, 01 Jun 2021 10:51:07 +0300https://alanorth.github.io/cgspace-notes/2021-06/
- <h2 id="2021-06-01">2021-06-01</h2>
-<ul>
-<li>IWMI notified me that AReS was down with an HTTP 502 error
-<ul>
-<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
-<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
-<li>I simply started it and AReS was running again:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-06-01">2021-06-01</h2>
<ul>
<li>IWMI notified me that AReS was down with an HTTP 502 error
<ul>
<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
<li>I simply started it and AReS was running again:</li>
</ul>
</li>
</ul>May, 2021
https://alanorth.github.io/cgspace-notes/2021-05/
Sun, 02 May 2021 09:50:54 +0300https://alanorth.github.io/cgspace-notes/2021-05/
- <h2 id="2021-05-01">2021-05-01</h2>
-<ul>
-<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
-<ul>
-<li>“RI/1.0”, 1337</li>
-<li>“Microsoft Office Word 2014”, 941</li>
-</ul>
-</li>
-<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
-</ul>
+ <h2 id="2021-05-01">2021-05-01</h2>
<ul>
<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
<ul>
<li>“RI/1.0”, 1337</li>
<li>“Microsoft Office Word 2014”, 941</li>
</ul>
</li>
<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
</ul>April, 2021
https://alanorth.github.io/cgspace-notes/2021-04/
Thu, 01 Apr 2021 09:50:54 +0300https://alanorth.github.io/cgspace-notes/2021-04/
- <h2 id="2021-04-01">2021-04-01</h2>
-<ul>
-<li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
-<ul>
-<li>I’m curious to see how the results compare with the results from Crossref yesterday</li>
-</ul>
-</li>
-<li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
-<ul>
-<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
-<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-04-01">2021-04-01</h2>
<ul>
<li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
<ul>
<li>I’m curious to see how the results compare with the results from Crossref yesterday</li>
</ul>
</li>
<li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
<ul>
<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
</ul>
</li>
</ul>March, 2021
https://alanorth.github.io/cgspace-notes/2021-03/
Mon, 01 Mar 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-03/
- <h2 id="2021-03-01">2021-03-01</h2>
-<ul>
-<li>Discuss some OpenRXV issues with Abdullah from CodeObia
-<ul>
-<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
-<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-03-01">2021-03-01</h2>
<ul>
<li>Discuss some OpenRXV issues with Abdullah from CodeObia
<ul>
<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
</ul>
</li>
</ul>CGSpace CG Core v2 Migration
https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
Sun, 21 Feb 2021 13:27:35 +0200https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
- <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
-<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
+ <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>February, 2021
https://alanorth.github.io/cgspace-notes/2021-02/
Mon, 01 Feb 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-02/
- <h2 id="2021-02-01">2021-02-01</h2>
-<ul>
-<li>Abenet said that CIP found more duplicate records in their export from AReS
-<ul>
-<li>I re-opened <a href="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
-<li>The shared link where the duplicates are is here: <a href="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
-</ul>
-</li>
-<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
-<li>Check the results of the AReS harvesting from last night:</li>
-</ul>
-<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
-</span></span><span style="display:flex;"><span>{
-</span></span><span style="display:flex;"><span> "count" : 100875,
-</span></span><span style="display:flex;"><span> "_shards" : {
-</span></span><span style="display:flex;"><span> "total" : 1,
-</span></span><span style="display:flex;"><span> "successful" : 1,
-</span></span><span style="display:flex;"><span> "skipped" : 0,
-</span></span><span style="display:flex;"><span> "failed" : 0
-</span></span><span style="display:flex;"><span> }
-</span></span><span style="display:flex;"><span>}
-</span></span></code></pre></div>
+ <h2 id="2021-02-01">2021-02-01</h2>
<ul>
<li>Abenet said that CIP found more duplicate records in their export from AReS
<ul>
<li>I re-opened <a href="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
<li>The shared link where the duplicates are is here: <a href="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
</ul>
</li>
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
<li>Check the results of the AReS harvesting from last night:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> "count" : 100875,
</span></span><span style="display:flex;"><span> "_shards" : {
</span></span><span style="display:flex;"><span> "total" : 1,
</span></span><span style="display:flex;"><span> "successful" : 1,
</span></span><span style="display:flex;"><span> "skipped" : 0,
</span></span><span style="display:flex;"><span> "failed" : 0
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>January, 2021
https://alanorth.github.io/cgspace-notes/2021-01/
Sun, 03 Jan 2021 10:13:54 +0200https://alanorth.github.io/cgspace-notes/2021-01/
- <h2 id="2021-01-03">2021-01-03</h2>
-<ul>
-<li>Peter notified me that some filters on AReS were broken again
-<ul>
-<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
-<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
-</ul>
-</li>
-<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
-<ul>
-<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
-<li>I adjusted it to default to 0 and added a note to the admin screen</li>
-<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
-<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2021-01-03">2021-01-03</h2>
<ul>
<li>Peter notified me that some filters on AReS were broken again
<ul>
<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
<ul>
<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
<li>I adjusted it to default to 0 and added a note to the admin screen</li>
<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
</ul>
</li>
</ul>December, 2020
https://alanorth.github.io/cgspace-notes/2020-12/
Tue, 01 Dec 2020 11:32:54 +0200https://alanorth.github.io/cgspace-notes/2020-12/
- <h2 id="2020-12-01">2020-12-01</h2>
-<ul>
-<li>Atmire responded about the issue with duplicate data in our Solr statistics
-<ul>
-<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
-<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
-<li>I started processing those (about 411,000 records):</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-12-01">2020-12-01</h2>
<ul>
<li>Atmire responded about the issue with duplicate data in our Solr statistics
<ul>
<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
<li>I started processing those (about 411,000 records):</li>
</ul>
</li>
</ul>CGSpace DSpace 6 Upgrade
@@ -637,252 +286,91 @@
https://alanorth.github.io/cgspace-notes/2020-11/
Sun, 01 Nov 2020 13:11:54 +0200https://alanorth.github.io/cgspace-notes/2020-11/
- <h2 id="2020-11-01">2020-11-01</h2>
-<ul>
-<li>Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
-<ul>
-<li>So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-11-01">2020-11-01</h2>
<ul>
<li>Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
<ul>
<li>So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.</li>
</ul>
</li>
</ul>October, 2020
https://alanorth.github.io/cgspace-notes/2020-10/
Tue, 06 Oct 2020 16:55:54 +0300https://alanorth.github.io/cgspace-notes/2020-10/
- <h2 id="2020-10-06">2020-10-06</h2>
-<ul>
-<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
-<ul>
-<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
-<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
-</ul>
-</li>
-<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
-<ul>
-<li>During the FlywayDB migration I got an error:</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
</ul>
</li>
<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
<ul>
<li>During the FlywayDB migration I got an error:</li>
</ul>
</li>
</ul>September, 2020
https://alanorth.github.io/cgspace-notes/2020-09/
Wed, 02 Sep 2020 15:35:54 +0300https://alanorth.github.io/cgspace-notes/2020-09/
- <h2 id="2020-09-02">2020-09-02</h2>
-<ul>
-<li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li>
-<li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
-<ul>
-<li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li>
-</ul>
-</li>
-<li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li>
-<li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
-<ul>
-<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li>
-</ul>
-</li>
-<li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li>
-</ul>
+ <h2 id="2020-09-02">2020-09-02</h2>
<ul>
<li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li>
<li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
<ul>
<li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li>
</ul>
</li>
<li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li>
<li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
<ul>
<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li>
</ul>
</li>
<li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li>
</ul>August, 2020
https://alanorth.github.io/cgspace-notes/2020-08/
Sun, 02 Aug 2020 15:35:54 +0300https://alanorth.github.io/cgspace-notes/2020-08/
- <h2 id="2020-08-02">2020-08-02</h2>
-<ul>
-<li>I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their <code>cg.coverage.country</code> text values
-<ul>
-<li>It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)</li>
-<li>It implements a “force” mode too that will clear existing country codes and re-tag everything</li>
-<li>It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-08-02">2020-08-02</h2>
<ul>
<li>I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their <code>cg.coverage.country</code> text values
<ul>
<li>It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)</li>
<li>It implements a “force” mode too that will clear existing country codes and re-tag everything</li>
<li>It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…</li>
</ul>
</li>
</ul>July, 2020
https://alanorth.github.io/cgspace-notes/2020-07/
Wed, 01 Jul 2020 10:53:54 +0300https://alanorth.github.io/cgspace-notes/2020-07/
- <h2 id="2020-07-01">2020-07-01</h2>
-<ul>
-<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
-<ul>
-<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
-<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
-<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
-</ul>
-</li>
-<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
-</ul>
+ <h2 id="2020-07-01">2020-07-01</h2>
<ul>
<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
<ul>
<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
</ul>
</li>
<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
</ul>June, 2020
https://alanorth.github.io/cgspace-notes/2020-06/
Mon, 01 Jun 2020 13:55:39 +0300https://alanorth.github.io/cgspace-notes/2020-06/
- <h2 id="2020-06-01">2020-06-01</h2>
-<ul>
-<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
-<ul>
-<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
-</ul>
-</li>
-<li>In other news, I checked the statistics API on DSpace 6 and it’s working</li>
-<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
-</ul>
+ <h2 id="2020-06-01">2020-06-01</h2>
<ul>
<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
<ul>
<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
</ul>
</li>
<li>In other news, I checked the statistics API on DSpace 6 and it’s working</li>
<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
</ul>May, 2020
https://alanorth.github.io/cgspace-notes/2020-05/
Sat, 02 May 2020 09:52:04 +0300https://alanorth.github.io/cgspace-notes/2020-05/
- <h2 id="2020-05-02">2020-05-02</h2>
-<ul>
-<li>Peter said that CTA is having problems submitting an item to CGSpace
-<ul>
-<li>Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again</li>
-<li>I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-05-02">2020-05-02</h2>
<ul>
<li>Peter said that CTA is having problems submitting an item to CGSpace
<ul>
<li>Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again</li>
<li>I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)</li>
</ul>
</li>
</ul>April, 2020
https://alanorth.github.io/cgspace-notes/2020-04/
Thu, 02 Apr 2020 10:53:24 +0300https://alanorth.github.io/cgspace-notes/2020-04/
- <h2 id="2020-04-02">2020-04-02</h2>
-<ul>
-<li>Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
-<ul>
-<li>I updated the fifty-eight existing items on CGSpace</li>
-</ul>
-</li>
-<li>Looking into the items Udana had asked about last week that were missing Altmetric donuts:
-<ul>
-<li><a href="https://hdl.handle.net/10568/103225">The first</a> is still missing its DOI, so I added it and <a href="https://twitter.com/mralanorth/status/1245632619661766657">tweeted its handle</a> (after a few hours there was a donut with score 222)</li>
-<li><a href="https://hdl.handle.net/10568/106899">The second item</a> now has a donut with score 2 since I <a href="https://twitter.com/mralanorth/status/1243158045540134913">tweeted its handle</a> last week</li>
-<li><a href="https://hdl.handle.net/10568/107258">The third item</a> now has a donut with score 1 since I <a href="https://twitter.com/mralanorth/status/1243158786392625153">tweeted it</a> last week</li>
-</ul>
-</li>
-<li>On the same note, the <a href="https://hdl.handle.net/10568/106573">one item</a> Abenet pointed out last week now has a donut with score of 104 after I <a href="https://twitter.com/mralanorth/status/1243163710241345536">tweeted it</a> last week</li>
-</ul>
+ <h2 id="2020-04-02">2020-04-02</h2>
<ul>
<li>Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
<ul>
<li>I updated the fifty-eight existing items on CGSpace</li>
</ul>
</li>
<li>Looking into the items Udana had asked about last week that were missing Altmetric donuts:
<ul>
<li><a href="https://hdl.handle.net/10568/103225">The first</a> is still missing its DOI, so I added it and <a href="https://twitter.com/mralanorth/status/1245632619661766657">tweeted its handle</a> (after a few hours there was a donut with score 222)</li>
<li><a href="https://hdl.handle.net/10568/106899">The second item</a> now has a donut with score 2 since I <a href="https://twitter.com/mralanorth/status/1243158045540134913">tweeted its handle</a> last week</li>
<li><a href="https://hdl.handle.net/10568/107258">The third item</a> now has a donut with score 1 since I <a href="https://twitter.com/mralanorth/status/1243158786392625153">tweeted it</a> last week</li>
</ul>
</li>
<li>On the same note, the <a href="https://hdl.handle.net/10568/106573">one item</a> Abenet pointed out last week now has a donut with score of 104 after I <a href="https://twitter.com/mralanorth/status/1243163710241345536">tweeted it</a> last week</li>
</ul>March, 2020
https://alanorth.github.io/cgspace-notes/2020-03/
Mon, 02 Mar 2020 12:31:30 +0200https://alanorth.github.io/cgspace-notes/2020-03/
- <h2 id="2020-03-02">2020-03-02</h2>
-<ul>
-<li>Update <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
-<ul>
-<li>Tag version 1.2.0 on GitHub</li>
-</ul>
-</li>
-<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <a href="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
-<ul>
-<li>You need to download this into the DSpace 6.x source and compile it</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-03-02">2020-03-02</h2>
<ul>
<li>Update <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
<ul>
<li>Tag version 1.2.0 on GitHub</li>
</ul>
</li>
<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <a href="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
<ul>
<li>You need to download this into the DSpace 6.x source and compile it</li>
</ul>
</li>
</ul>February, 2020
https://alanorth.github.io/cgspace-notes/2020-02/
Sun, 02 Feb 2020 11:56:30 +0200https://alanorth.github.io/cgspace-notes/2020-02/
- <h2 id="2020-02-02">2020-02-02</h2>
-<ul>
-<li>Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
-<ul>
-<li>Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database</li>
-<li>I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks</li>
-<li>Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff</li>
-<li>The code finally builds and runs with a fresh install</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-02-02">2020-02-02</h2>
<ul>
<li>Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday
<ul>
<li>Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database</li>
<li>I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks</li>
<li>Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff</li>
<li>The code finally builds and runs with a fresh install</li>
</ul>
</li>
</ul>January, 2020
https://alanorth.github.io/cgspace-notes/2020-01/
Mon, 06 Jan 2020 10:48:30 +0200https://alanorth.github.io/cgspace-notes/2020-01/
- <h2 id="2020-01-06">2020-01-06</h2>
-<ul>
-<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
-<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
-<ul>
-<li>The score is now linked to the DOI</li>
-<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
-<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
-</ul>
-</li>
-</ul>
-<h2 id="2020-01-07">2020-01-07</h2>
-<ul>
-<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
-<ul>
-<li>The DOI has a score of 259, but the Handle has no score at all</li>
-<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
-</ul>
-</li>
-</ul>
+ <h2 id="2020-01-06">2020-01-06</h2>
<ul>
<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
<ul>
<li>The score is now linked to the DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
</ul>
</li>
</ul>
<h2 id="2020-01-07">2020-01-07</h2>
<ul>
<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
<ul>
<li>The DOI has a score of 259, but the Handle has no score at all</li>
<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
</ul>
</li>
</ul>December, 2019
https://alanorth.github.io/cgspace-notes/2019-12/
Sun, 01 Dec 2019 11:22:30 +0200https://alanorth.github.io/cgspace-notes/2019-12/
- <h2 id="2019-12-01">2019-12-01</h2>
-<ul>
-<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
-<ul>
-<li>Check any packages that have residual configs and purge them:</li>
-<li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li>
-<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># apt update && apt full-upgrade
-# apt-get autoremove && apt-get autoclean
-# dpkg -C
-# reboot
-</code></pre>
+ <h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># apt update && apt full-upgrade
# apt-get autoremove && apt-get autoclean
# dpkg -C
# reboot
</code></pre>November, 2019
https://alanorth.github.io/cgspace-notes/2019-11/
Mon, 04 Nov 2019 12:20:30 +0200https://alanorth.github.io/cgspace-notes/2019-11/
- <h2 id="2019-11-04">2019-11-04</h2>
-<ul>
-<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
-<ul>
-<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-4671942
-# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-1277694
-</code></pre><ul>
-<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
-<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
-1183456
-# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
-106781
-</code></pre>
+ <h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
1277694
</code></pre><ul>
<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
106781
</code></pre>October, 2019
@@ -896,511 +384,168 @@
https://alanorth.github.io/cgspace-notes/2019-09/
Sun, 01 Sep 2019 10:17:51 +0300https://alanorth.github.io/cgspace-notes/2019-09/
- <h2 id="2019-09-01">2019-09-01</h2>
-<ul>
-<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
-<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 440 17.58.101.255
- 441 157.55.39.101
- 485 207.46.13.43
- 728 169.60.128.125
- 730 207.46.13.108
- 758 157.55.39.9
- 808 66.160.140.179
- 814 207.46.13.212
- 2472 163.172.71.23
- 6092 3.94.211.189
-# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 33 2a01:7e00::f03c:91ff:fe16:fcb
- 57 3.83.192.124
- 57 3.87.77.25
- 57 54.82.1.8
- 822 2a01:9cc0:47:1:1a:4:0:2
- 1223 45.5.184.72
- 1633 172.104.229.92
- 5112 205.186.128.185
- 7249 2a01:7e00::f03c:91ff:fe18:7396
- 9124 45.5.186.2
-</code></pre>
+ <h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>August, 2019
https://alanorth.github.io/cgspace-notes/2019-08/
Sat, 03 Aug 2019 12:39:51 +0300https://alanorth.github.io/cgspace-notes/2019-08/
- <h2 id="2019-08-03">2019-08-03</h2>
-<ul>
-<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
-</ul>
-<h2 id="2019-08-04">2019-08-04</h2>
-<ul>
-<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
-<li>Run system updates on CGSpace (linode18) and reboot it
-<ul>
-<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
-<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
-</ul>
-</li>
-<li>Run system updates on DSpace Test (linode19) and reboot it</li>
-</ul>
+ <h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
</ul>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
</ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>July, 2019
https://alanorth.github.io/cgspace-notes/2019-07/
Mon, 01 Jul 2019 12:13:51 +0300https://alanorth.github.io/cgspace-notes/2019-07/
- <h2 id="2019-07-01">2019-07-01</h2>
-<ul>
-<li>Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice</li>
-<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
-<ul>
-<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
-<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
-</ul>
-</li>
-<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
-</ul>
+ <h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>June, 2019
https://alanorth.github.io/cgspace-notes/2019-06/
Sun, 02 Jun 2019 10:57:51 +0300https://alanorth.github.io/cgspace-notes/2019-06/
- <h2 id="2019-06-02">2019-06-02</h2>
-<ul>
-<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
-<li>Run system updates on CGSpace (linode18) and reboot it</li>
-</ul>
-<h2 id="2019-06-03">2019-06-03</h2>
-<ul>
-<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
-</ul>
+ <h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>May, 2019
https://alanorth.github.io/cgspace-notes/2019-05/
Wed, 01 May 2019 07:37:43 +0300https://alanorth.github.io/cgspace-notes/2019-05/
- <h2 id="2019-05-01">2019-05-01</h2>
-<ul>
-<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
-<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
-<ul>
-<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
-<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
-</ul>
-</li>
-<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
-DELETE 1
-</code></pre><ul>
-<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
-</ul>
+ <h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul>
</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
</code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
</ul>April, 2019
https://alanorth.github.io/cgspace-notes/2019-04/
Mon, 01 Apr 2019 09:00:43 +0300https://alanorth.github.io/cgspace-notes/2019-04/
- <h2 id="2019-04-01">2019-04-01</h2>
-<ul>
-<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
-<ul>
-<li>They asked if we had plans to enable RDF support in CGSpace</li>
-</ul>
-</li>
-<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
-<ul>
-<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
-</ul>
-</li>
-</ul>
-<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
- 4432 200
-</code></pre><ul>
-<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
-<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
-</ul>
-<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
-$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
-</code></pre>
+ <h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul>
</li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
</code></pre><ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre>March, 2019
https://alanorth.github.io/cgspace-notes/2019-03/
Fri, 01 Mar 2019 12:16:30 +0100https://alanorth.github.io/cgspace-notes/2019-03/
- <h2 id="2019-03-01">2019-03-01</h2>
-<ul>
-<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
-<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
-<li>Looking at the other half of Udana’s WLE records from 2018-11
-<ul>
-<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
-<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
-<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
-<li>68.15% � 9.45 instead of 68.15% ± 9.45</li>
-<li>2003�2013 instead of 2003–2013</li>
-</ul>
-</li>
-<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
-</ul>
+ <h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
<li>Looking at the other half of Udana’s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% � 9.45 instead of 68.15% ± 9.45</li>
<li>2003�2013 instead of 2003–2013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>February, 2019
https://alanorth.github.io/cgspace-notes/2019-02/
Fri, 01 Feb 2019 21:37:30 +0200https://alanorth.github.io/cgspace-notes/2019-02/
- <h2 id="2019-02-01">2019-02-01</h2>
-<ul>
-<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
-<li>The top IPs before, during, and after this latest alert tonight were:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 245 207.46.13.5
- 332 54.70.40.11
- 385 5.143.231.38
- 405 207.46.13.173
- 405 207.46.13.75
- 1117 66.249.66.219
- 1121 35.237.175.180
- 1546 5.9.6.51
- 2474 45.5.186.2
- 5490 85.25.237.71
-</code></pre><ul>
-<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
-<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
-<li>There were just over 3 million accesses in the nginx logs last month:</li>
-</ul>
-<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
-3018243
-
-real 0m19.873s
-user 0m22.203s
-sys 0m1.979s
-</code></pre>
+ <h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
</code></pre><ul>
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
</ul>
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
</code></pre>January, 2019
https://alanorth.github.io/cgspace-notes/2019-01/
Wed, 02 Jan 2019 09:48:30 +0200https://alanorth.github.io/cgspace-notes/2019-01/
- <h2 id="2019-01-02">2019-01-02</h2>
-<ul>
-<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
-<li>I don’t see anything interesting in the web server logs around that time though:</li>
-</ul>
-<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 92 40.77.167.4
- 99 210.7.29.100
- 120 38.126.157.45
- 177 35.237.175.180
- 177 40.77.167.32
- 216 66.249.75.219
- 225 18.203.76.93
- 261 46.101.86.248
- 357 207.46.13.1
- 903 54.70.40.11
-</code></pre>
+ <h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don’t see anything interesting in the web server logs around that time though:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>December, 2018
https://alanorth.github.io/cgspace-notes/2018-12/
Sun, 02 Dec 2018 02:09:30 +0200https://alanorth.github.io/cgspace-notes/2018-12/
- <h2 id="2018-12-01">2018-12-01</h2>
-<ul>
-<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
-<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
-<li>Then I ran all system updates and restarted the server</li>
-</ul>
-<h2 id="2018-12-02">2018-12-02</h2>
-<ul>
-<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
-</ul>
+ <h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>November, 2018
https://alanorth.github.io/cgspace-notes/2018-11/
Thu, 01 Nov 2018 16:41:30 +0200https://alanorth.github.io/cgspace-notes/2018-11/
- <h2 id="2018-11-01">2018-11-01</h2>
-<ul>
-<li>Finalize AReS Phase I and Phase II ToRs</li>
-<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
-</ul>
-<h2 id="2018-11-03">2018-11-03</h2>
-<ul>
-<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
-<li>Today these are the top 10 IPs:</li>
-</ul>
+ <h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>October, 2018
https://alanorth.github.io/cgspace-notes/2018-10/
Mon, 01 Oct 2018 22:31:54 +0300https://alanorth.github.io/cgspace-notes/2018-10/
- <h2 id="2018-10-01">2018-10-01</h2>
-<ul>
-<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
-<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
-</ul>
+ <h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
</ul>September, 2018
https://alanorth.github.io/cgspace-notes/2018-09/
Sun, 02 Sep 2018 09:55:54 +0300https://alanorth.github.io/cgspace-notes/2018-09/
- <h2 id="2018-09-02">2018-09-02</h2>
-<ul>
-<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
-<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
-<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
-<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
-</ul>
+ <h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul>August, 2018
https://alanorth.github.io/cgspace-notes/2018-08/
Wed, 01 Aug 2018 11:52:54 +0300https://alanorth.github.io/cgspace-notes/2018-08/
- <h2 id="2018-08-01">2018-08-01</h2>
-<ul>
-<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
-</ul>
-<pre tabindex="0"><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
-[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
-[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
-</code></pre><ul>
-<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
-<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
-<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
-<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
-<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
-<li>I ran all system updates on DSpace Test and rebooted it</li>
-</ul>
+ <h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre tabindex="0"><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
<li>I ran all system updates on DSpace Test and rebooted it</li>
</ul>July, 2018
https://alanorth.github.io/cgspace-notes/2018-07/
Sun, 01 Jul 2018 12:56:54 +0300https://alanorth.github.io/cgspace-notes/2018-07/
- <h2 id="2018-07-01">2018-07-01</h2>
-<ul>
-<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
-</ul>
-<pre tabindex="0"><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
-</code></pre><ul>
-<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
-</ul>
-<pre tabindex="0"><code>There is insufficient memory for the Java Runtime Environment to continue.
-</code></pre>
+ <h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
<pre tabindex="0"><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre tabindex="0"><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>June, 2018
https://alanorth.github.io/cgspace-notes/2018-06/
Mon, 04 Jun 2018 19:49:54 -0700https://alanorth.github.io/cgspace-notes/2018-06/
- <h2 id="2018-06-04">2018-06-04</h2>
-<ul>
-<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
-<ul>
-<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
-</ul>
-</li>
-<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
-<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
-</ul>
-<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
-</code></pre><ul>
-<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
-<li>Time to index ~70,000 items on CGSpace:</li>
-</ul>
-<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
-
-real 74m42.646s
-user 8m5.056s
-sys 2m7.289s
-</code></pre>
+ <h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
</ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
<li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s
user 8m5.056s
sys 2m7.289s
</code></pre>May, 2018
https://alanorth.github.io/cgspace-notes/2018-05/
Tue, 01 May 2018 16:43:54 +0300https://alanorth.github.io/cgspace-notes/2018-05/
- <h2 id="2018-05-01">2018-05-01</h2>
-<ul>
-<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
-<ul>
-<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
-<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
-</ul>
-</li>
-<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
-<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
-</ul>
+ <h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>April, 2018
https://alanorth.github.io/cgspace-notes/2018-04/
Sun, 01 Apr 2018 16:13:54 +0200https://alanorth.github.io/cgspace-notes/2018-04/
- <h2 id="2018-04-01">2018-04-01</h2>
-<ul>
-<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
-<li>Catalina logs at least show some memory errors yesterday:</li>
-</ul>
+ <h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
</ul>March, 2018
https://alanorth.github.io/cgspace-notes/2018-03/
Fri, 02 Mar 2018 16:07:54 +0200https://alanorth.github.io/cgspace-notes/2018-03/
- <h2 id="2018-03-02">2018-03-02</h2>
-<ul>
-<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
-</ul>
+ <h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>February, 2018
https://alanorth.github.io/cgspace-notes/2018-02/
Thu, 01 Feb 2018 16:28:54 +0200https://alanorth.github.io/cgspace-notes/2018-02/
- <h2 id="2018-02-01">2018-02-01</h2>
-<ul>
-<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
-<li>We don’t need to distinguish between internal and external works, so that makes it just a simple list</li>
-<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
-<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu’s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li>
-</ul>
+ <h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don’t need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu’s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul>January, 2018
https://alanorth.github.io/cgspace-notes/2018-01/
Tue, 02 Jan 2018 08:35:54 -0800https://alanorth.github.io/cgspace-notes/2018-01/
- <h2 id="2018-01-02">2018-01-02</h2>
-<ul>
-<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
-<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
-<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
-<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
-<li>And just before that I see this:</li>
-</ul>
-<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-</code></pre><ul>
-<li>Ah hah! So the pool was actually empty!</li>
-<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
-<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
-<li>I notice this error quite a few times in dspace.log:</li>
-</ul>
-<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-</code></pre><ul>
-<li>And there are many of these errors every day for the past month:</li>
-</ul>
-<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-</code></pre><ul>
-<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
-</ul>
+ <h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
<li>And just before that I see this:</li>
</ul>
<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
<li>I notice this error quite a few times in dspace.log:</li>
</ul>
<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
</code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
</ul>
<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
dspace.log.2017-11-24:11
dspace.log.2017-11-25:0
dspace.log.2017-11-26:1
dspace.log.2017-11-27:7
dspace.log.2017-11-28:21
dspace.log.2017-11-29:31
dspace.log.2017-11-30:15
dspace.log.2017-12-01:15
dspace.log.2017-12-02:20
dspace.log.2017-12-03:38
dspace.log.2017-12-04:65
dspace.log.2017-12-05:43
dspace.log.2017-12-06:72
dspace.log.2017-12-07:27
dspace.log.2017-12-08:15
dspace.log.2017-12-09:29
dspace.log.2017-12-10:35
dspace.log.2017-12-11:20
dspace.log.2017-12-12:44
dspace.log.2017-12-13:36
dspace.log.2017-12-14:59
dspace.log.2017-12-15:104
dspace.log.2017-12-16:53
dspace.log.2017-12-17:66
dspace.log.2017-12-18:83
dspace.log.2017-12-19:101
dspace.log.2017-12-20:74
dspace.log.2017-12-21:55
dspace.log.2017-12-22:66
dspace.log.2017-12-23:50
dspace.log.2017-12-24:85
dspace.log.2017-12-25:62
dspace.log.2017-12-26:49
dspace.log.2017-12-27:30
dspace.log.2017-12-28:54
dspace.log.2017-12-29:68
dspace.log.2017-12-30:89
dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
</ul>December, 2017
https://alanorth.github.io/cgspace-notes/2017-12/
Fri, 01 Dec 2017 13:53:54 +0300https://alanorth.github.io/cgspace-notes/2017-12/
- <h2 id="2017-12-01">2017-12-01</h2>
-<ul>
-<li>Uptime Robot noticed that CGSpace went down</li>
-<li>The logs say “Timeout waiting for idle object”</li>
-<li>PostgreSQL activity says there are 115 connections currently</li>
-<li>The list of connections to XMLUI and REST API for today:</li>
-</ul>
+ <h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say “Timeout waiting for idle object”</li>
<li>PostgreSQL activity says there are 115 connections currently</li>
<li>The list of connections to XMLUI and REST API for today:</li>
</ul>November, 2017
https://alanorth.github.io/cgspace-notes/2017-11/
Thu, 02 Nov 2017 09:37:54 +0200https://alanorth.github.io/cgspace-notes/2017-11/
- <h2 id="2017-11-01">2017-11-01</h2>
-<ul>
-<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
-</ul>
-<h2 id="2017-11-02">2017-11-02</h2>
-<ul>
-<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
-</ul>
-<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
-0
-</code></pre><ul>
-<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-</code></pre>
+ <h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
0
</code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
</ul>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
</code></pre>October, 2017
https://alanorth.github.io/cgspace-notes/2017-10/
Sun, 01 Oct 2017 08:07:54 +0300https://alanorth.github.io/cgspace-notes/2017-10/
- <h2 id="2017-10-01">2017-10-01</h2>
-<ul>
-<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
-</ul>
-<pre tabindex="0"><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-</code></pre><ul>
-<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
-<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
-</ul>
+ <h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre tabindex="0"><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre><ul>
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
</ul>CGIAR Library Migration
@@ -1414,58 +559,21 @@ COPY 54701
https://alanorth.github.io/cgspace-notes/2017-09/
Thu, 07 Sep 2017 16:54:52 +0700https://alanorth.github.io/cgspace-notes/2017-09/
- <h2 id="2017-09-06">2017-09-06</h2>
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
-</ul>
-<h2 id="2017-09-07">2017-09-07</h2>
-<ul>
-<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group</li>
-</ul>
+ <h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group</li>
</ul>August, 2017
https://alanorth.github.io/cgspace-notes/2017-08/
Tue, 01 Aug 2017 11:51:52 +0300https://alanorth.github.io/cgspace-notes/2017-08/
- <h2 id="2017-08-01">2017-08-01</h2>
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
-<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
-<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
-<li>This means our Tomcat Crawler Session Valve is working</li>
-<li>But many of the bots are browsing dynamic URLs like:
-<ul>
-<li>/handle/10568/3353/discover</li>
-<li>/handle/10568/16510/browse</li>
-</ul>
-</li>
-<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
-<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
-<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
-<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
-<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
-<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
-<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
-<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
-<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
-</ul>
+ <h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>July, 2017
https://alanorth.github.io/cgspace-notes/2017-07/
Sat, 01 Jul 2017 18:03:52 +0300https://alanorth.github.io/cgspace-notes/2017-07/
- <h2 id="2017-07-01">2017-07-01</h2>
-<ul>
-<li>Run system updates and reboot DSpace Test</li>
-</ul>
-<h2 id="2017-07-04">2017-07-04</h2>
-<ul>
-<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
-<li>Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace</li>
-<li>We can use PostgreSQL’s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
-</ul>
+ <h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL’s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul>June, 2017
@@ -1486,299 +594,126 @@ COPY 54701
https://alanorth.github.io/cgspace-notes/2017-04/
Sun, 02 Apr 2017 17:08:52 +0200https://alanorth.github.io/cgspace-notes/2017-04/
- <h2 id="2017-04-02">2017-04-02</h2>
-<ul>
-<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
-<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
-<ul>
-<li>Remove redundant/duplicate text in the DSpace submission license</li>
-<li>Testing the CMYK patch on a collection with 650 items:</li>
-</ul>
-<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-</code></pre>
+ <h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
</ul>
<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
</code></pre>March, 2017
https://alanorth.github.io/cgspace-notes/2017-03/
Wed, 01 Mar 2017 17:08:52 +0200https://alanorth.github.io/cgspace-notes/2017-03/
- <h2 id="2017-03-01">2017-03-01</h2>
-<ul>
-<li>Run the 279 CIAT author corrections on CGSpace</li>
-</ul>
-<h2 id="2017-03-02">2017-03-02</h2>
-<ul>
-<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
-<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
-<li>They might come in at the top level in one “CGIAR System” community, or with several communities</li>
-<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
-<li>Need to send Peter and Michael some notes about this in a few days</li>
-<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
-<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
-<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
-<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
-</ul>
-<pre tabindex="0"><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-</code></pre>
+ <h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
<li>They might come in at the top level in one “CGIAR System” community, or with several communities</li>
<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
<li>Need to send Peter and Michael some notes about this in a few days</li>
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
</ul>
<pre tabindex="0"><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre>February, 2017
https://alanorth.github.io/cgspace-notes/2017-02/
Tue, 07 Feb 2017 07:04:52 -0800https://alanorth.github.io/cgspace-notes/2017-02/
- <h2 id="2017-02-07">2017-02-07</h2>
-<ul>
-<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
- id | collection_id | item_id
--------+---------------+---------
- 92551 | 313 | 80278
- 92550 | 313 | 80278
- 90774 | 1051 | 80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-</code></pre><ul>
-<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
-<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
-</ul>
+ <h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
</code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
</ul>January, 2017
https://alanorth.github.io/cgspace-notes/2017-01/
Mon, 02 Jan 2017 10:43:00 +0300https://alanorth.github.io/cgspace-notes/2017-01/
- <h2 id="2017-01-02">2017-01-02</h2>
-<ul>
-<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
-<li>I tested on DSpace Test as well and it doesn’t work there either</li>
-<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
-</ul>
+ <h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn’t work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
</ul>December, 2016
https://alanorth.github.io/cgspace-notes/2016-12/
Fri, 02 Dec 2016 10:43:00 +0300https://alanorth.github.io/cgspace-notes/2016-12/
- <h2 id="2016-12-02">2016-12-02</h2>
-<ul>
-<li>CGSpace was down for five hours in the morning while I was sleeping</li>
-<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
-</ul>
-<pre tabindex="0"><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-</code></pre><ul>
-<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
-<li>I’ve raised a ticket with Atmire to ask</li>
-<li>Another worrying error from dspace.log is:</li>
-</ul>
+ <h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
</ul>
<pre tabindex="0"><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
</code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
<li>I’ve raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
</ul>November, 2016
https://alanorth.github.io/cgspace-notes/2016-11/
Tue, 01 Nov 2016 09:21:00 +0300https://alanorth.github.io/cgspace-notes/2016-11/
- <h2 id="2016-11-01">2016-11-01</h2>
-<ul>
-<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
+ <h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>October, 2016
https://alanorth.github.io/cgspace-notes/2016-10/
Mon, 03 Oct 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-10/
- <h2 id="2016-10-03">2016-10-03</h2>
-<ul>
-<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
-<li>Need to test the following scenarios to see how author order is affected:
-<ul>
-<li>ORCIDs only</li>
-<li>ORCIDs plus normal authors</li>
-</ul>
-</li>
-<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
-</ul>
-<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-</code></pre>
+ <h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>September, 2016
https://alanorth.github.io/cgspace-notes/2016-09/
Thu, 01 Sep 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-09/
- <h2 id="2016-09-01">2016-09-01</h2>
-<ul>
-<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
-<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
-<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
-<li>It looks like we might be able to use OUs now, instead of DCs:</li>
-</ul>
-<pre tabindex="0"><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-</code></pre>
+ <h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
</ul>
<pre tabindex="0"><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
</code></pre>August, 2016
https://alanorth.github.io/cgspace-notes/2016-08/
Mon, 01 Aug 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-08/
- <h2 id="2016-08-01">2016-08-01</h2>
-<ul>
-<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
-<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
-<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
-<li>bower stuff is a dead end, waste of time, too many issues</li>
-<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
-<li>Start working on DSpace 5.1 → 5.5 port:</li>
-</ul>
-<pre tabindex="0"><code>$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-</code></pre>
+ <h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.1 → 5.5 port:</li>
</ul>
<pre tabindex="0"><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
</code></pre>July, 2016
https://alanorth.github.io/cgspace-notes/2016-07/
Fri, 01 Jul 2016 10:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-07/
- <h2 id="2016-07-01">2016-07-01</h2>
-<ul>
-<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
-<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
-</ul>
-<pre tabindex="0"><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-</code></pre><ul>
-<li>In this case the select query was showing 95 results before the update</li>
-</ul>
+ <h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
</ul>
<pre tabindex="0"><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value
------------
(0 rows)
</code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
</ul>June, 2016
https://alanorth.github.io/cgspace-notes/2016-06/
Wed, 01 Jun 2016 10:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-06/
- <h2 id="2016-06-01">2016-06-01</h2>
-<ul>
-<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
-<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI’s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
-<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
-<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc</a></li>
-<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
-<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
-</ul>
+ <h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI’s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
</ul>May, 2016
https://alanorth.github.io/cgspace-notes/2016-05/
Sun, 01 May 2016 23:06:00 +0300https://alanorth.github.io/cgspace-notes/2016-05/
- <h2 id="2016-05-01">2016-05-01</h2>
-<ul>
-<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
-<li>I have blocked access to the API now</li>
-<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
-</ul>
-<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
-3168
-</code></pre>
+ <h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
</ul>
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
</code></pre>April, 2016
https://alanorth.github.io/cgspace-notes/2016-04/
Mon, 04 Apr 2016 11:06:00 +0300https://alanorth.github.io/cgspace-notes/2016-04/
- <h2 id="2016-04-04">2016-04-04</h2>
-<ul>
-<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
-<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
-<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
-<li>This will save us a few gigs of backup space we’re paying for on S3</li>
-<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
-</ul>
+ <h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we’re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>March, 2016
https://alanorth.github.io/cgspace-notes/2016-03/
Wed, 02 Mar 2016 16:50:00 +0300https://alanorth.github.io/cgspace-notes/2016-03/
- <h2 id="2016-03-02">2016-03-02</h2>
-<ul>
-<li>Looking at issues with author authorities on CGSpace</li>
-<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
-<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
-</ul>
+ <h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul>February, 2016
https://alanorth.github.io/cgspace-notes/2016-02/
Fri, 05 Feb 2016 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2016-02/
- <h2 id="2016-02-05">2016-02-05</h2>
-<ul>
-<li>Looking at some DAGRIS data for Abenet Yabowork</li>
-<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
-<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
-<ul>
-<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
-<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
-</ul>
+ <h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<ul>
<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
</ul>January, 2016
https://alanorth.github.io/cgspace-notes/2016-01/
Wed, 13 Jan 2016 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2016-01/
- <h2 id="2016-01-13">2016-01-13</h2>
-<ul>
-<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
-<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
-<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
-</ul>
+ <h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
</ul>December, 2015
https://alanorth.github.io/cgspace-notes/2015-12/
Wed, 02 Dec 2015 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2015-12/
- <h2 id="2015-12-02">2015-12-02</h2>
-<ul>
-<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
-</ul>
-<pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-</code></pre>
+ <h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>November, 2015
https://alanorth.github.io/cgspace-notes/2015-11/
Mon, 23 Nov 2015 17:00:57 +0300https://alanorth.github.io/cgspace-notes/2015-11/
- <h2 id="2015-11-22">2015-11-22</h2>
-<ul>
-<li>CGSpace went down</li>
-<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
-<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
-</ul>
-<pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-</code></pre>
+ <h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
</ul>
<pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
</code></pre>
diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html
index 5b92edd33..8724afe74 100644
--- a/docs/posts/page/10/index.html
+++ b/docs/posts/page/10/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/11/index.html b/docs/posts/page/11/index.html
index 70b1c4187..c464fdb73 100644
--- a/docs/posts/page/11/index.html
+++ b/docs/posts/page/11/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index f9a37b591..55f41e6c6 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index be90c4bf8..dc2265052 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index e58c748b2..a0f1617b5 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index c9d8ab9fd..4c3b9a1cd 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index cafa5a4f8..067584b8c 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 385093e63..644f79f28 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 119bc555e..ef206692c 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 297aec622..f974f5d2b 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,14 +10,14 @@
-
+
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index d9a411194..ab3df1d6e 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,22 +3,22 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2023-12-01T08:48:36+03:00
+ 2023-12-02T10:38:09+03:00https://alanorth.github.io/cgspace-notes/
- 2023-12-01T08:48:36+03:00
+ 2023-12-02T10:38:09+03:00https://alanorth.github.io/cgspace-notes/2023-12/
- 2023-12-01T08:48:36+03:00
+ 2023-12-02T10:38:09+03:00https://alanorth.github.io/cgspace-notes/categories/notes/
- 2023-12-01T08:48:36+03:00
+ 2023-12-02T10:38:09+03:00https://alanorth.github.io/cgspace-notes/posts/
- 2023-12-01T08:48:36+03:00
+ 2023-12-02T10:38:09+03:00https://alanorth.github.io/cgspace-notes/2023-11/
- 2023-11-23T16:15:13+03:00
+ 2023-12-02T10:38:09+03:00https://alanorth.github.io/cgspace-notes/2023-10/2023-11-02T20:58:43+03:00
diff --git a/docs/tags/index.html b/docs/tags/index.html
index a82f0ce3d..db4555e08 100644
--- a/docs/tags/index.html
+++ b/docs/tags/index.html
@@ -17,7 +17,7 @@
-
+
diff --git a/docs/tags/migration/index.html b/docs/tags/migration/index.html
index 66b1458ea..18a04d3b9 100644
--- a/docs/tags/migration/index.html
+++ b/docs/tags/migration/index.html
@@ -17,7 +17,7 @@
-
+
diff --git a/docs/tags/migration/index.xml b/docs/tags/migration/index.xml
index c636d8ad0..d791a92ea 100644
--- a/docs/tags/migration/index.xml
+++ b/docs/tags/migration/index.xml
@@ -13,8 +13,7 @@
https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
Sun, 21 Feb 2021 13:27:35 +0200https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/
- <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
-<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
+ <p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>CGSpace DSpace 6 Upgrade
diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html
index 84995f24e..7e83b5887 100644
--- a/docs/tags/notes/index.html
+++ b/docs/tags/notes/index.html
@@ -17,7 +17,7 @@
-
+
diff --git a/docs/tags/notes/index.xml b/docs/tags/notes/index.xml
index a1450d500..05a3c68c3 100644
--- a/docs/tags/notes/index.xml
+++ b/docs/tags/notes/index.xml
@@ -13,58 +13,21 @@
https://alanorth.github.io/cgspace-notes/2017-09/
Thu, 07 Sep 2017 16:54:52 +0700https://alanorth.github.io/cgspace-notes/2017-09/
- <h2 id="2017-09-06">2017-09-06</h2>
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
-</ul>
-<h2 id="2017-09-07">2017-09-07</h2>
-<ul>
-<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group</li>
-</ul>
+ <h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group</li>
</ul>August, 2017
https://alanorth.github.io/cgspace-notes/2017-08/
Tue, 01 Aug 2017 11:51:52 +0300https://alanorth.github.io/cgspace-notes/2017-08/
- <h2 id="2017-08-01">2017-08-01</h2>
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
-<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
-<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
-<li>This means our Tomcat Crawler Session Valve is working</li>
-<li>But many of the bots are browsing dynamic URLs like:
-<ul>
-<li>/handle/10568/3353/discover</li>
-<li>/handle/10568/16510/browse</li>
-</ul>
-</li>
-<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
-<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
-<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
-<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
-<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
-<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
-<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
-<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
-<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
-</ul>
+ <h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>July, 2017
https://alanorth.github.io/cgspace-notes/2017-07/
Sat, 01 Jul 2017 18:03:52 +0300https://alanorth.github.io/cgspace-notes/2017-07/
- <h2 id="2017-07-01">2017-07-01</h2>
-<ul>
-<li>Run system updates and reboot DSpace Test</li>
-</ul>
-<h2 id="2017-07-04">2017-07-04</h2>
-<ul>
-<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
-<li>Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace</li>
-<li>We can use PostgreSQL’s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
-</ul>
+ <h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL’s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul>June, 2017
@@ -85,299 +48,126 @@
https://alanorth.github.io/cgspace-notes/2017-04/
Sun, 02 Apr 2017 17:08:52 +0200https://alanorth.github.io/cgspace-notes/2017-04/
- <h2 id="2017-04-02">2017-04-02</h2>
-<ul>
-<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
-<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
-<ul>
-<li>Remove redundant/duplicate text in the DSpace submission license</li>
-<li>Testing the CMYK patch on a collection with 650 items:</li>
-</ul>
-<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-</code></pre>
+ <h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
</ul>
<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
</code></pre>March, 2017
https://alanorth.github.io/cgspace-notes/2017-03/
Wed, 01 Mar 2017 17:08:52 +0200https://alanorth.github.io/cgspace-notes/2017-03/
- <h2 id="2017-03-01">2017-03-01</h2>
-<ul>
-<li>Run the 279 CIAT author corrections on CGSpace</li>
-</ul>
-<h2 id="2017-03-02">2017-03-02</h2>
-<ul>
-<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
-<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
-<li>They might come in at the top level in one “CGIAR System” community, or with several communities</li>
-<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
-<li>Need to send Peter and Michael some notes about this in a few days</li>
-<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
-<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
-<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
-<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
-</ul>
-<pre tabindex="0"><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-</code></pre>
+ <h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
<li>They might come in at the top level in one “CGIAR System” community, or with several communities</li>
<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
<li>Need to send Peter and Michael some notes about this in a few days</li>
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
</ul>
<pre tabindex="0"><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre>February, 2017
https://alanorth.github.io/cgspace-notes/2017-02/
Tue, 07 Feb 2017 07:04:52 -0800https://alanorth.github.io/cgspace-notes/2017-02/
- <h2 id="2017-02-07">2017-02-07</h2>
-<ul>
-<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
-</ul>
-<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
- id | collection_id | item_id
--------+---------------+---------
- 92551 | 313 | 80278
- 92550 | 313 | 80278
- 90774 | 1051 | 80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-</code></pre><ul>
-<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
-<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
-</ul>
+ <h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
</code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
</ul>January, 2017
https://alanorth.github.io/cgspace-notes/2017-01/
Mon, 02 Jan 2017 10:43:00 +0300https://alanorth.github.io/cgspace-notes/2017-01/
- <h2 id="2017-01-02">2017-01-02</h2>
-<ul>
-<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
-<li>I tested on DSpace Test as well and it doesn’t work there either</li>
-<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
-</ul>
+ <h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn’t work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
</ul>December, 2016
https://alanorth.github.io/cgspace-notes/2016-12/
Fri, 02 Dec 2016 10:43:00 +0300https://alanorth.github.io/cgspace-notes/2016-12/
- <h2 id="2016-12-02">2016-12-02</h2>
-<ul>
-<li>CGSpace was down for five hours in the morning while I was sleeping</li>
-<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
-</ul>
-<pre tabindex="0"><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-</code></pre><ul>
-<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
-<li>I’ve raised a ticket with Atmire to ask</li>
-<li>Another worrying error from dspace.log is:</li>
-</ul>
+ <h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
</ul>
<pre tabindex="0"><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
</code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
<li>I’ve raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
</ul>November, 2016
https://alanorth.github.io/cgspace-notes/2016-11/
Tue, 01 Nov 2016 09:21:00 +0300https://alanorth.github.io/cgspace-notes/2016-11/
- <h2 id="2016-11-01">2016-11-01</h2>
-<ul>
-<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
+ <h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>October, 2016
https://alanorth.github.io/cgspace-notes/2016-10/
Mon, 03 Oct 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-10/
- <h2 id="2016-10-03">2016-10-03</h2>
-<ul>
-<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
-<li>Need to test the following scenarios to see how author order is affected:
-<ul>
-<li>ORCIDs only</li>
-<li>ORCIDs plus normal authors</li>
-</ul>
-</li>
-<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
-</ul>
-<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-</code></pre>
+ <h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>September, 2016
https://alanorth.github.io/cgspace-notes/2016-09/
Thu, 01 Sep 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-09/
- <h2 id="2016-09-01">2016-09-01</h2>
-<ul>
-<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
-<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
-<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
-<li>It looks like we might be able to use OUs now, instead of DCs:</li>
-</ul>
-<pre tabindex="0"><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-</code></pre>
+ <h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
</ul>
<pre tabindex="0"><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
</code></pre>August, 2016
https://alanorth.github.io/cgspace-notes/2016-08/
Mon, 01 Aug 2016 15:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-08/
- <h2 id="2016-08-01">2016-08-01</h2>
-<ul>
-<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
-<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
-<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
-<li>bower stuff is a dead end, waste of time, too many issues</li>
-<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
-<li>Start working on DSpace 5.1 → 5.5 port:</li>
-</ul>
-<pre tabindex="0"><code>$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-</code></pre>
+ <h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.1 → 5.5 port:</li>
</ul>
<pre tabindex="0"><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
</code></pre>July, 2016
https://alanorth.github.io/cgspace-notes/2016-07/
Fri, 01 Jul 2016 10:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-07/
- <h2 id="2016-07-01">2016-07-01</h2>
-<ul>
-<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
-<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
-</ul>
-<pre tabindex="0"><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-</code></pre><ul>
-<li>In this case the select query was showing 95 results before the update</li>
-</ul>
+ <h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
</ul>
<pre tabindex="0"><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value
------------
(0 rows)
</code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
</ul>June, 2016
https://alanorth.github.io/cgspace-notes/2016-06/
Wed, 01 Jun 2016 10:53:00 +0300https://alanorth.github.io/cgspace-notes/2016-06/
- <h2 id="2016-06-01">2016-06-01</h2>
-<ul>
-<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
-<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI’s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
-<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
-<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc</a></li>
-<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
-<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
-</ul>
+ <h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI’s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
</ul>May, 2016
https://alanorth.github.io/cgspace-notes/2016-05/
Sun, 01 May 2016 23:06:00 +0300https://alanorth.github.io/cgspace-notes/2016-05/
- <h2 id="2016-05-01">2016-05-01</h2>
-<ul>
-<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
-<li>I have blocked access to the API now</li>
-<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
-</ul>
-<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
-3168
-</code></pre>
+ <h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
</ul>
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
</code></pre>April, 2016
https://alanorth.github.io/cgspace-notes/2016-04/
Mon, 04 Apr 2016 11:06:00 +0300https://alanorth.github.io/cgspace-notes/2016-04/
- <h2 id="2016-04-04">2016-04-04</h2>
-<ul>
-<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
-<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
-<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
-<li>This will save us a few gigs of backup space we’re paying for on S3</li>
-<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
-</ul>
+ <h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we’re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>March, 2016
https://alanorth.github.io/cgspace-notes/2016-03/
Wed, 02 Mar 2016 16:50:00 +0300https://alanorth.github.io/cgspace-notes/2016-03/
- <h2 id="2016-03-02">2016-03-02</h2>
-<ul>
-<li>Looking at issues with author authorities on CGSpace</li>
-<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
-<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
-</ul>
+ <h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul>February, 2016
https://alanorth.github.io/cgspace-notes/2016-02/
Fri, 05 Feb 2016 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2016-02/
- <h2 id="2016-02-05">2016-02-05</h2>
-<ul>
-<li>Looking at some DAGRIS data for Abenet Yabowork</li>
-<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
-<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
-</ul>
-<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
-<ul>
-<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
-<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
-</ul>
+ <h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul>
<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<ul>
<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
</ul>January, 2016
https://alanorth.github.io/cgspace-notes/2016-01/
Wed, 13 Jan 2016 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2016-01/
- <h2 id="2016-01-13">2016-01-13</h2>
-<ul>
-<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
-<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
-<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
-</ul>
+ <h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
</ul>December, 2015
https://alanorth.github.io/cgspace-notes/2015-12/
Wed, 02 Dec 2015 13:18:00 +0300https://alanorth.github.io/cgspace-notes/2015-12/
- <h2 id="2015-12-02">2015-12-02</h2>
-<ul>
-<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
-</ul>
-<pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-</code></pre>
+ <h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>November, 2015
https://alanorth.github.io/cgspace-notes/2015-11/
Mon, 23 Nov 2015 17:00:57 +0300https://alanorth.github.io/cgspace-notes/2015-11/
- <h2 id="2015-11-22">2015-11-22</h2>
-<ul>
-<li>CGSpace went down</li>
-<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
-<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
-</ul>
-<pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-</code></pre>
+ <h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
</ul>
<pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
</code></pre>
diff --git a/docs/tags/notes/page/2/index.html b/docs/tags/notes/page/2/index.html
index 1324f95d0..99d77bc93 100644
--- a/docs/tags/notes/page/2/index.html
+++ b/docs/tags/notes/page/2/index.html
@@ -17,7 +17,7 @@
-
+
diff --git a/docs/tags/notes/page/3/index.html b/docs/tags/notes/page/3/index.html
index 45b89e3cd..0bc2ff678 100644
--- a/docs/tags/notes/page/3/index.html
+++ b/docs/tags/notes/page/3/index.html
@@ -17,7 +17,7 @@
-
+