diff --git a/public/.gitignore b/public/.gitignore deleted file mode 100644 index e69de29bb..000000000 diff --git a/public/2015-11/index.html b/public/2015-11/index.html deleted file mode 100644 index 1f318c3b9..000000000 --- a/public/2015-11/index.html +++ /dev/null @@ -1,353 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - November, 2015 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

November, 2015

- -
-

2015-11-22

- -
    -
  • CGSpace went down
  • -
  • Looks like DSpace exhausted its PostgreSQL connection pool
  • -
  • Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-
- -

- -
    -
  • For now I have increased the limit from 60 to 90, run updates, and rebooted the server
  • -
- -

2015-11-24

- -
    -
  • CGSpace went down again
  • -
  • Getting emails from uptimeRobot and uptimeButler that it’s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors
  • -
  • Looks like there are still a bunch of idle PostgreSQL connections:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-96
-
- -
    -
  • For some reason the number of idle connections is very high since we upgraded to DSpace 5
  • -
- -

2015-11-25

- -
    -
  • Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config
  • -
  • The OAI application requests stylesheets and javascript files with the path /oai/static/css, which gets matched here:
  • -
- -
# static assets we can load from the file system directly with nginx
-location ~ /(themes|static|aspects/ReportingSuite) {
-    try_files $uri @tomcat;
-...
-
- -
    -
  • The document root is relative to the xmlui app, so this gets a 404—I’m not sure why it doesn’t pass to @tomcat
  • -
  • Anyways, I can’t find any URIs with path /static, and the more important point is to handle all the static theme assets, so we can just remove static from the regex for now (who cares if we can’t use nginx to send Etags for OAI CSS!)
  • -
  • Also, I noticed we aren’t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use add_header in a child block it doesn’t inherit the others
  • -
  • We simply need to add include extra-security.conf; to the above location block (but research and test first)
  • -
  • We should add WOFF assets to the list of things to set expires for:
  • -
- -
location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
-
- -
    -
  • We should also add aspects/Statistics to the location block for static assets (minus static from above):
  • -
- -
location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
-
- -
    -
  • Need to check /about on CGSpace, as it’s blank on my local test server and we might need to add something there
  • -
  • CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-93
-
- -
    -
  • I looked closer at the idle connections and saw that many have been idle for hours (current time on server is 2015-11-25T20:20:42+0000):
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | less -S
-datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |
--------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
-20951 | cgspace  | 10966 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37731 | 2015-11-25 13:13:02.837624+00 |                               | 20
-20951 | cgspace  | 10967 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37737 | 2015-11-25 13:13:03.069421+00 |                               | 20
-...
-
- -
    -
  • There is a relevant Jira issue about this: https://jira.duraspace.org/browse/DS-1458
  • -
  • It seems there is some sense changing DSpace’s default db.maxidle from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)
  • -
  • Change db.maxidle from -1 to 10, reduce db.maxconnections from 90 to 50, and restart postgres and tomcat7
  • -
  • Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well
  • -
  • Also deploy the nginx fixes for the try_files location block as well as the expires block
  • -
- -

2015-11-26

- -
    -
  • CGSpace behaving much better since changing db.maxidle yesterday, but still two up/down notices from monitoring this morning (better than 50!)
  • -
  • CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item
  • -
  • Not as bad for me, but still unsustainable if you have to get many:
  • -
- -
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-8.415
-
- -
    -
  • Monitoring e-mailed in the evening to say CGSpace was down
  • -
  • Idle connections in PostgreSQL again:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
-66
-
- -
    -
  • At the time, the current DSpace pool size was 50…
  • -
  • I reduced the pool back to the default of 30, and reduced the db.maxidle settings from 10 to 8
  • -
- -

2015-11-29

- -
    -
  • Still more alerts that CGSpace has been up and down all day
  • -
  • Current database settings for DSpace:
  • -
- -
db.maxconnections = 30
-db.maxwait = 5000
-db.maxidle = 8
-db.statementpool = true
-
- -
    -
  • And idle connections:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
-49
-
- -
    -
  • Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace’s thirst can ever be quenched
  • -
  • On another note, SUNScholar’s notes suggest adjusting some other postgres variables: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database
  • -
  • This might help with REST API speed (which I mentioned above and still need to do real tests)
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2015-12/index.html b/public/2015-12/index.html deleted file mode 100644 index 425f811f9..000000000 --- a/public/2015-12/index.html +++ /dev/null @@ -1,370 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - December, 2015 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

December, 2015

- -
-

2015-12-02

- -
    -
  • Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
  • -
- -
# cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-
- -

- -
    -
  • I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper
  • -
  • Need to remember to go check if everything is ok in a few days and then change CGSpace
  • -
  • CGSpace went down again (due to PostgreSQL idle connections of course)
  • -
  • Current database settings for DSpace are db.maxconnections = 30 and db.maxidle = 8, yet idle connections are exceeding this:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
-39
-
- -
    -
  • I restarted PostgreSQL and Tomcat and it’s back
  • -
  • On a related note of why CGSpace is so slow, I decided to finally try the pgtune script to tune the postgres settings:
  • -
- -
# apt-get install pgtune
-# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
-# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig 
-# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
-
- -
    -
  • It introduced the following new settings:
  • -
- -
default_statistics_target = 50
-maintenance_work_mem = 480MB
-constraint_exclusion = on
-checkpoint_completion_target = 0.9
-effective_cache_size = 5632MB
-work_mem = 48MB
-wal_buffers = 8MB
-checkpoint_segments = 16
-shared_buffers = 1920MB
-max_connections = 80
-
- -
    -
  • Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc
  • -
  • For what it’s worth, now the REST API should be faster (because of these PostgreSQL tweaks):
  • -
- -
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-1.474
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-2.141
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-1.685
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-1.995
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-1.786
-
- - - -

CCAFS item

- -
    -
  • The authorizations for the item are all public READ, and I don’t see any errors in dspace.log when browsing that item
  • -
  • I filed a ticket on Atmire’s issue tracker
  • -
  • I also filed a ticket on Atmire’s issue tracker for the PostgreSQL stuff
  • -
- -

2015-12-03

- -
    -
  • CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)
  • -
  • Idle postgres connections look like this (with no change in DSpace db settings lately):
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
-29
-
- -
    -
  • I restarted Tomcat and postgres…
  • -
  • Atmire commented that we should raise the JVM heap size by ~500M, so it is now -Xms3584m -Xmx3584m
  • -
  • We weren’t out of heap yet, but it’s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it’s ok
  • -
  • A possible side effect is that I see that the REST API is twice as fast for the request above now:
  • -
- -
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-1.368
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.968
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-1.006
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.849
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.806
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.854
-
- -

2015-12-05

- -
    -
  • CGSpace has been up and down all day and REST API is completely unresponsive
  • -
  • PostgreSQL idle connections are currently:
  • -
- -
postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
-28
-
- -
    -
  • I have reverted all the pgtune tweaks from the other day, as they didn’t fix the stability issues, so I’d rather not have them introducing more variables into the equation
  • -
  • The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November
  • -
- -

PostgreSQL bgwriter (year) -PostgreSQL cache (year) -PostgreSQL locks (year) -PostgreSQL scans (year)

- -

2015-12-07

- -
    -
  • Atmire sent some fixes to DSpace’s REST API code that was leaving contexts open (causing the slow performance and database issues)
  • -
  • After deploying the fix to CGSpace the REST API is consistently faster:
  • -
- -
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.675
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.599
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.588
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.566
-$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
-0.497
-
- -

2015-12-08

- -
    -
  • Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn’t as good, but it’s much faster and causes less IO/CPU load
  • -
  • Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot’s crawl rate to the “Let Google optimize” setting
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2015/12/ccafs-item-no-metadata.png b/public/2015/12/ccafs-item-no-metadata.png deleted file mode 100644 index 552b50d12..000000000 Binary files a/public/2015/12/ccafs-item-no-metadata.png and /dev/null differ diff --git a/public/2015/12/postgres_bgwriter-year.png b/public/2015/12/postgres_bgwriter-year.png deleted file mode 100644 index 918447914..000000000 Binary files a/public/2015/12/postgres_bgwriter-year.png and /dev/null differ diff --git a/public/2015/12/postgres_cache_cgspace-year.png b/public/2015/12/postgres_cache_cgspace-year.png deleted file mode 100644 index 890b5132a..000000000 Binary files a/public/2015/12/postgres_cache_cgspace-year.png and /dev/null differ diff --git a/public/2015/12/postgres_connections_cgspace-year.png b/public/2015/12/postgres_connections_cgspace-year.png deleted file mode 100644 index faecf8692..000000000 Binary files a/public/2015/12/postgres_connections_cgspace-year.png and /dev/null differ diff --git a/public/2015/12/postgres_locks_cgspace-year.png b/public/2015/12/postgres_locks_cgspace-year.png deleted file mode 100644 index 63624da00..000000000 Binary files a/public/2015/12/postgres_locks_cgspace-year.png and /dev/null differ diff --git a/public/2015/12/postgres_scans_cgspace-year.png b/public/2015/12/postgres_scans_cgspace-year.png deleted file mode 100644 index 84e5f93be..000000000 Binary files a/public/2015/12/postgres_scans_cgspace-year.png and /dev/null differ diff --git a/public/2016-01/index.html b/public/2016-01/index.html deleted file mode 100644 index bc42c42a6..000000000 --- a/public/2016-01/index.html +++ /dev/null @@ -1,285 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - January, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

January, 2016

- -
-

2016-01-13

- -
    -
  • Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
  • -
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • -
  • Update GitHub wiki for documentation of maintenance tasks.
  • -
- -

- -

2016-01-14

- -
    -
  • Update CCAFS project identifiers in input-forms.xml
  • -
  • Run system updates and restart the server
  • -
- -

2016-01-18

- -
    -
  • Change “Extension material” to “Extension Material” in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era)
  • -
- -

2016-01-19

- -
    -
  • Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme’s primary color (#157)
  • -
  • Tweak date-based facets to show more values in drill-down ranges (#162)
  • -
  • Need to remember to clear the Cocoon cache after deployment or else you don’t see the new ranges immediately
  • -
  • Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account
  • -
  • Altmetrics’ support for Handles is kinda weak, so they can’t associate our items with DOIs until they are tweeted or blogged, etc first.
  • -
- -

2016-01-21

- -
    -
  • Still waiting for my IFTTT recipe to fire, two days later
  • -
  • It looks like the Atom feed on CGSpace hasn’t changed in two days, but there have definitely been new items
  • -
  • The RSS feed is nearly as old, but has different old items there
  • -
  • On a hunch I cleared the Cocoon cache and now the feeds are fresh
  • -
  • Looks like there is configuration option related to this, webui.feed.cache.age, which defaults to 48 hours, though I’m not sure what relation it has to the Cocoon cache
  • -
  • In any case, we should change this cache to be something more like 6 hours, as we publish new items several times per day.
  • -
  • Work around a CSS issue with long URLs in the item view (#172)
  • -
- -

2016-01-25

- -
    -
  • Re-deploy CGSpace and DSpace Test with latest 5_x-prod branch
  • -
  • This included the social icon fixes/updates, date-based facet tweaks, reducing the feed cache age, and fixing a layout issue in XMLUI item view when an item had long URLs
  • -
- -

2016-01-26

- -
    -
  • Run nginx updates on CGSpace and DSpace Test (1.8.1 and 1.9.10, respectively)
  • -
  • Run updates on DSpace Test and reboot for new Linode kernel Linux 4.4.0-x86_64-linode63 (first update in months)
  • -
- -

2016-01-28

- -
    -
  • Start looking at importing some Bioversity data that had been prepared earlier this week
  • - -
  • While checking the data I noticed something strange, there are 79 items but only 8 unique PDFs:

    - -

    $ ls SimpleArchiveForBio/ | wc -l -79 -$ find SimpleArchiveForBio/ -iname “*.pdf” -exec basename {} \; | sort -u | wc -l -8

  • -
- -

2016-01-29

- -
    -
  • Add five missing center-specific subjects to XMLUI item view (#174)
  • -
  • This CCAFS item Before:
  • -
- -

XMLUI subjects before

- -
    -
  • After:
  • -
- -

XMLUI subjects after

- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-02/index.html b/public/2016-02/index.html deleted file mode 100644 index fcd78c8f0..000000000 --- a/public/2016-02/index.html +++ /dev/null @@ -1,538 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - February, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

February, 2016

- -
-

2016-02-05

- -
    -
  • Looking at some DAGRIS data for Abenet Yabowork
  • -
  • Lots of issues with spaces, newlines, etc causing the import to fail
  • -
  • I noticed we have a very interesting list of countries on CGSpace:
  • -
- -

CGSpace country list

- -
    -
  • Not only are there 49,000 countries, we have some blanks (25)…
  • -
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • -
- -

- -

2016-02-06

- -
    -
  • Found a way to get items with null/empty metadata values from SQL
  • -
  • First, find the metadata_field_id for the field you want from the metadatafieldregistry table:
  • -
- -
dspacetest=# select * from metadatafieldregistry;
-
- -
    -
  • In this case our country field is 78
  • -
  • Now find all resources with type 2 (item) that have null/empty values for that field:
  • -
- -
dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL);
-
- -
    -
  • Then you can find the handle that owns it from its resource_id:
  • -
- -
dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678';
-
- -
    -
  • It’s 25 items so editing in the web UI is annoying, let’s try SQL!
  • -
- -
dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value='';
-DELETE 25
-
- -
    -
  • After that perhaps a regular dspace index-discovery (no -b) should suffice…
  • -
  • Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 “|||” countries are still there
  • -
  • Maybe I need to do a full re-index…
  • -
  • Yep! The full re-index seems to work.
  • -
  • Process the empty countries on CGSpace
  • -
- -

2016-02-07

- -
    -
  • Working on cleaning up Abenet’s DAGRIS data with OpenRefine
  • -
  • I discovered two really nice functions in OpenRefine: value.trim() and value.escape("javascript") which shows whitespace characters like \r\n!
  • -
  • For some reason when you import an Excel file into OpenRefine it exports dates like 1949 to 1949.0 in the CSV
  • -
  • I re-import the resulting CSV and run a GREL on the date issued column: value.replace("\.0", "")
  • -
  • I need to start running DSpace in Mac OS X instead of a Linux VM
  • -
  • Install PostgreSQL from homebrew, then configure and import CGSpace database dump:
  • -
- -
$ postgres -D /opt/brew/var/postgres
-$ createuser --superuser postgres
-$ createuser --pwprompt dspacetest
-$ createdb -O dspacetest --encoding=UNICODE dspacetest
-$ psql postgres
-postgres=# alter user dspacetest createuser;
-postgres=# \q
-$ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-02-07.backup 
-$ psql postgres
-postgres=# alter user dspacetest nocreateuser;
-postgres=# \q
-$ vacuumdb dspacetest
-$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
-
- -
    -
  • After building and running a fresh_install I symlinked the webapps into Tomcat’s webapps folder:
  • -
- -
$ mv /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT.orig
-$ ln -sfv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT
-$ ln -sfv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/rest
-$ ln -sfv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/jspui
-$ ln -sfv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/oai
-$ ln -sfv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/solr
-$ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
-
- -
    -
  • Add CATALINA_OPTS in /opt/brew/Cellar/tomcat/8.0.30/libexec/bin/setenv.sh, as this script is sourced by the catalina startup script
  • -
  • For example:
  • -
- -
CATALINA_OPTS="-Djava.awt.headless=true -Xms2048m -Xmx2048m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8"
-
- -
    -
  • After verifying that the site is working, start a full index:
  • -
- -
$ ~/dspace/bin/dspace index-discovery -b
-
- -

2016-02-08

- -
    -
  • Finish cleaning up and importing ~400 DAGRIS items into CGSpace
  • -
  • Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme’s brand colors (#154)
  • -
- -

ILRI submission buttons -Drylands submission buttons

- -

2016-02-09

- -
    -
  • Re-sync DSpace Test with CGSpace
  • -
  • Help Sisay with OpenRefine
  • -
  • Enable HTTPS on DSpace Test using Let’s Encrypt:
  • -
- -
$ cd ~/src/git
-$ git clone https://github.com/letsencrypt/letsencrypt
-$ cd letsencrypt
-$ sudo service nginx stop
-# add port 443 to firewall rules
-$ ./letsencrypt-auto certonly --standalone -d dspacetest.cgiar.org
-$ sudo service nginx start
-$ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-become-pass
-
- -
    -
  • We should install it in /opt/letsencrypt and then script the renewal script, but first we have to wire up some variables and template stuff based on the script here: https://letsencrypt.org/howitworks/
  • -
  • I had to export some CIAT items that were being cleaned up on the test server and I noticed their dc.contributor.author fields have DSpace 5 authority index UUIDs…
  • -
  • To clean those up in OpenRefine I used this GREL expression: value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
  • -
  • Getting more and more hangs on DSpace Test, seemingly random but also during CSV import
  • -
  • Logs don’t always show anything right when it fails, but eventually one of these appears:
  • -
- -
org.dspace.discovery.SearchServiceException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space
-
- -
    -
  • or
  • -
- -
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
-
- -
    -
  • Right now DSpace Test’s Tomcat heap is set to 1536m and we have quite a bit of free RAM:
  • -
- -
# free -m
-             total       used       free     shared    buffers     cached
-Mem:          3950       3902         48          9         37       1311
--/+ buffers/cache:       2552       1397
-Swap:          255         57        198
-
- -
    -
  • So I’ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)
  • -
- -

2016-02-11

- -
    -
  • Massaging some CIAT data in OpenRefine
  • -
  • There are 1200 records that have PDFs, and will need to be imported into CGSpace
  • -
  • I created a filename column based on the dc.identifier.url column using the following transform:
  • -
- -
value.split('/')[-1]
-
- -
    -
  • Then I wrote a tool called generate-thumbnails.py to download the PDFs and generate thumbnails for them, for example:
  • -
- -
$ ./generate-thumbnails.py ciat-reports.csv
-Processing 64661.pdf
-> Downloading 64661.pdf
-> Creating thumbnail for 64661.pdf
-Processing 64195.pdf
-> Downloading 64195.pdf
-> Creating thumbnail for 64195.pdf
-
- -

2016-02-12

- -
    -
  • Looking at CIAT’s records again, there are some problems with a dozen or so files (out of 1200)
  • -
  • A few items are using the same exact PDF
  • -
  • A few items are using HTM or DOC files
  • -
  • A few items link to PDFs on IFPRI’s e-Library or Research Gate
  • -
  • A few items have no item
  • -
  • Also, I’m not sure if we import these items, will be remove the dc.identifier.url field from the records?
  • -
- -

2016-02-12

- -
    -
  • Looking at CIAT’s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I’m not sure if we can use those
  • -
  • 265 items have dirty, URL-encoded filenames:
  • -
- -
$ ls | grep -c -E "%"
-265
-
- -
    -
  • I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames
  • -
  • This python2 snippet seems to work in the CLI, but not so well in OpenRefine:
  • -
- -
$ python -c "import urllib, sys; print urllib.unquote(sys.argv[1])" CIAT_COLOMBIA_000169_T%C3%A9cnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf
-CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf
-
- -
    -
  • Merge pull requests for submission form theming (#178) and missing center subjects in XMLUI item views (#176)
  • -
  • They will be deployed on CGSpace the next time I re-deploy
  • -
- -

2016-02-16

- -
    -
  • Turns out OpenRefine has an unescape function!
  • -
- -
value.unescape("url")
-
- -
    -
  • This turns the URLs into human-readable versions that we can use as proper filenames
  • -
  • Run web server and system updates on DSpace Test and reboot
  • -
  • To merge dc.identifier.url and dc.identifier.url[], rename the second column so it doesn’t have the brackets, like dc.identifier.url2
  • -
  • Then you create a facet for blank values on each column, show the rows that have values for one and not the other, then transform each independently to have the contents of the other, with “||” in between
  • -
  • Work on Python script for parsing and downloading PDF records from dc.identifier.url
  • -
  • To get filenames from dc.identifier.url, create a new column based on this transform: forEach(value.split('||'), v, v.split('/')[-1]).join('||')
  • -
  • This also works for records that have multiple URLs (separated by “||”)
  • -
- -

2016-02-17

- -
    -
  • Re-deploy CGSpace, run all system updates, and reboot
  • -
  • More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test
  • -
  • SAFBuilder has a bug preventing it from processing filenames containing more than one underscore
  • -
  • Need to re-process the filename column to replace multiple underscores with one: value.replace(/_{2,}/, "_")
  • -
- -

2016-02-20

- -
    -
  • Turns out the “bug” in SAFBuilder isn’t a bug, it’s a feature that allows you to encode extra information like the destintion bundle in the filename
  • -
  • Also, it seems DSpace’s SAF import tool doesn’t like importing filenames that have accents in them:
  • -
- -
java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/CIAT_COLOMBIA_000075_Medición_de_palatabilidad_en_forrajes.pdf (No such file or directory)
-
- -
    -
  • Need to rename files to have no accents or umlauts, etc…
  • -
  • Useful custom text facet for URLs ending with “.pdf”: value.endsWith(".pdf")
  • -
- -

2016-02-22

- -
    -
  • To change Spanish accents to ASCII in OpenRefine:
  • -
- -
value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
-
- -
    -
  • But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac
  • -
  • On closer inspection, I can import files with the following names on Linux (DSpace Test):
  • -
- -
Bitstream: tést.pdf
-Bitstream: tést señora.pdf
-Bitstream: tést señora alimentación.pdf
-
- -
    -
  • Seems it could be something with the HFS+ filesystem actually, as it’s not UTF-8 (it’s something like UCS-2)
  • -
  • HFS+ stores filenames as a string, and filenames with accents get stored as character+accent whereas Linux’s ext4 stores them as an array of bytes
  • -
  • Running the SAFBuilder on Mac OS X works if you’re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem’s encoding matches
  • -
- -

2016-02-29

- -
    -
  • Got notified by some CIFOR colleagues that the Google Scholar team had contacted them about CGSpace’s incorrect ordering of authors in Google Scholar metadata
  • -
  • Turns out there is a patch, and it was merged in DSpace 5.4: https://jira.duraspace.org/browse/DS-2679
  • -
  • I’ve merged it into our 5_x-prod branch that is currently based on DSpace 5.1
  • -
  • We found a bug when a user searches from the homepage, sorts the results, and then tries to click “View More” in a sidebar facet
  • -
  • I am not sure what causes it yet, but I opened an issue for it: https://github.com/ilri/DSpace/issues/179
  • -
  • Have more problems with SAFBuilder on Mac OS X
  • -
  • Now it doesn’t recognize description hints in the filename column, like: test.pdf__description:Blah
  • -
  • But on Linux it works fine
  • -
  • Trying to test Atmire’s series of stats and CUA fixes from January and February, but their branch history is really messy and it’s hard to see what’s going on
  • -
  • Rebasing their branch on top of our production branch results in a broken Tomcat, so I’m going to tell them to fix their history and make a proper pull request
  • -
  • Looking at the filenames for the CIAT Reports, some have some really ugly characters, like: ' or , or = or [ or ] or ( or ) or _.pdf or ._ etc
  • -
  • It’s tricky to parse those things in some programming languages so I’d rather just get rid of the weird stuff now in OpenRefine:
  • -
- -
value.replace("'",'').replace('_=_','_').replace(',','').replace('[','').replace(']','').replace('(','').replace(')','').replace('_.pdf','.pdf').replace('._','_')
-
- -
    -
  • Finally import the 1127 CIAT items into CGSpace: https://cgspace.cgiar.org/handle/10568/35710
  • -
  • Re-deploy CGSpace with the Google Scholar fix, but I’m waiting on the Atmire fixes for now, as the branch history is ugly
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-03/index.html b/public/2016-03/index.html deleted file mode 100644 index ed041191c..000000000 --- a/public/2016-03/index.html +++ /dev/null @@ -1,438 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - March, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

March, 2016

- -
-

2016-03-02

- -
    -
  • Looking at issues with author authorities on CGSpace
  • -
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • -
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • -
- -

- -

2016-03-07

- -
    -
  • Troubleshooting the issues with the slew of commits for Atmire modules in #182
  • -
  • Their changes on 5_x-dev branch work, but it is messy as hell with merge commits and old branch base
  • -
  • When I rebase their branch on the latest 5_x-prod I get blank white pages
  • -
  • I identified one commit that causes the issue and let them know
  • -
  • Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:
  • -
- -
Exception in thread "Lucene Merge Thread #19" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
-
- -

2016-03-08

- -
    -
  • Add a few new filters to Atmire’s Listings and Reports module (#180)
  • -
  • We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were
  • -
- -

2016-03-10

- -
    -
  • Disable the lucene cron job on CGSpace as it shouldn’t be needed anymore
  • -
  • Discuss ORCiD and duplicate authors on Yammer
  • -
  • Request new documentation for Atmire CUA and L&R modules, as ours are from 2013
  • -
  • Walk Sisay through some data cleaning workflows in OpenRefine
  • -
  • Start cleaning up the configuration for Atmire’s CUA module (#184)
  • -
  • It is very messed up because some labels are incorrect, fields are missing, etc
  • -
- -

Mixed up label in Atmire CUA

- -
    -
  • Update documentation for Atmire modules
  • -
- -

2016-03-11

- -
    -
  • As I was looking at the CUA config I realized our Discovery config is all messed up and confusing
  • -
  • I’ve opened an issue to track some of that work (#186)
  • -
  • I did some major cleanup work on Discovery and XMLUI stuff related to the dc.type indexes (#187)
  • -
  • We had been confusing dc.type (a Dublin Core value) with dc.type.output (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.
  • -
  • There is still some more work to be done to remove references to old outputtype and output
  • -
- -

2016-03-14

- -
    -
  • Fix some items that had invalid dates (I noticed them in the log during a re-indexing)
  • -
  • Reset search.index.* to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): #188
  • -
  • Make titles in Discovery and Browse by more consistent (singular, sentence case, etc) (#186)
  • -
  • Also four or so center-specific subject strings were missing for Discovery
  • -
- -

Missing XMLUI string

- -

2016-03-15

- -
    -
  • Create simple theme for new AVCD community just for a unique Google Tracking ID (#191)
  • -
- -

2016-03-16

- -
    -
  • Still having problems deploying Atmire’s CUA updates and fixes from January!
  • -
  • More discussion on the GitHub issue here: https://github.com/ilri/DSpace/pull/182
  • -
  • Clean up Atmire CUA config (#193)
  • -
  • Help Sisay with some PostgreSQL queries to clean up the incorrect dc.contributor.corporateauthor field
  • -
  • I noticed that we have some weird values in dc.language:
  • -
- -
# select * from metadatavalue where metadata_field_id=37;
- metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
--------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
-           1942571 |       35342 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942468 |       35345 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942479 |       35337 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942505 |       35336 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942519 |       35338 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942535 |       35340 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942555 |       35341 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942588 |       35343 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942610 |       35346 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942624 |       35347 |                37 | hi         |           |     1 |           |         -1 |                2
-           1942639 |       35339 |                37 | hi         |           |     1 |           |         -1 |                2
-
- -
    -
  • It seems this dc.language field isn’t really used, but we should delete these values
  • -
  • Also, dc.language.iso has some weird values, like “En” and “English”
  • -
- -

2016-03-17

- -
    -
  • It turns out hi is the ISO 639 language code for Hindi, but these should be in dc.language.iso instead of dc.language
  • -
  • I fixed the eleven items with hi as well as some using the incorrect vn for Vietnamese
  • -
  • Start discussing CG core with Abenet and Sisay
  • -
  • Re-sync CGSpace database to DSpace Test for Atmire to do some tests about the problematic CUA patches
  • -
  • The patches work fine with a clean database, so the error was caused by some mismatch in CUA versions and the database during my testing
  • -
- -

2016-03-18

- -
    -
  • Merge Atmire fixes into 5_x-prod
  • -
  • Discuss thumbnails with Francesca from Bioversity
  • -
  • Some of their items end up with thumbnails that have a big white border around them:
  • -
- -

Excessive whitespace in thumbnail

- -
    -
  • Turns out we can add -trim to the GraphicsMagick options to trim the whitespace
  • -
- -

Trimmed thumbnail

- -
    -
  • Command used:
  • -
- -
$ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_EN-2015_2021.pdf\[0\] cover.jpg
-
- -
    -
  • Also, it looks like adding -sharpen 0x1.0 really improves the quality of the image for only a few KB
  • -
- -

2016-03-21

- - - -

CGSpace pages in Google index

- -
    -
  • Turns out this is a problem with DSpace’s robots.txt, and there’s a Jira ticket since December, 2015: https://jira.duraspace.org/browse/DS-2962
  • -
  • I am not sure if I want to apply it yet
  • -
  • For now I’ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools
  • -
- -

URL parameters cause millions of dynamic pages -Setting pages with the filter_0 param not to show in search results

- -
    -
  • Move AVCD collection to new community and update move_collection.sh script: https://gist.github.com/alanorth/392c4660e8b022d99dfa
  • -
  • It seems Feedburner can do HTTPS now, so we might be able to update our feeds and simplify the nginx configs
  • -
  • De-deploy CGSpace with latest 5_x-prod branch
  • -
  • Run updates on CGSpace and reboot server (new kernel, 4.5.0)
  • -
  • Deploy Let’s Encrypt certificate for cgspace.cgiar.org, but still need to work it into the ansible playbooks
  • -
- -

2016-03-22

- -
    -
  • Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly (#198)
  • -
- -

2016-03-23

- - - -
Can't find method org.dspace.app.xmlui.aspect.administrative.FlowGroupUtils.processSaveGroup(org.dspace.core.Context,number,string,[Ljava.lang.String;,[Ljava.lang.String;,org.apache.cocoon.environment.wrapper.RequestWrapper). (resource://aspects/Administrative/administrative.js#967)
-
- -
    -
  • I can reproduce the same error on DSpace Test and on my Mac
  • -
  • Looks to be an issue with the Atmire modules, I’ve submitted a ticket to their tracker.
  • -
- -

2016-03-24

- - - -

2016-03-25

- -
    -
  • Having problems with Listings and Reports, seems to be caused by a rogue reference to dc.type.output
  • -
  • This is the error we get when we proceed to the second page of Listings and Reports: https://gist.github.com/alanorth/b2d7fb5b82f94898caaf
  • -
  • Commenting out the line works, but I haven’t figured out the proper syntax for referring to dc.type.*
  • -
- -

2016-03-28

- -
    -
  • Look into enabling the embargo during item submission, see: https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess
  • -
  • Seems we only want AccessStep because UploadWithEmbargoStep disables the ability to edit embargos at the item level
  • -
  • This pull request enables the ability to set an item-level embargo during submission: https://github.com/ilri/DSpace/pull/203
  • -
  • I figured out that the problem with Listings and Reports was because I disabled the search.index.* last week, and they are still used by JSPUI apparently
  • -
  • This pull request re-enables them: https://github.com/ilri/DSpace/pull/202
  • -
  • Re-deploy DSpace Test, run all system updates, and restart the server
  • -
  • Looks like the Listings and Reports fix was NOT due to the search indexes (which are actually not used), and rather due to the filter configuration in the Listings and Reports config
  • -
  • This pull request simply updates the config for the dc.type.output → dc.type change that was made last week: https://github.com/ilri/DSpace/pull/204
  • -
  • Deploy robots.txt fix, embargo for item submissions, and listings and reports fix on CGSpace
  • -
- -

2016-03-29

- -
    -
  • Skype meeting with Peter and Addis team to discuss metadata changes for Dublin Core, CGcore, and CGSpace-specific fields
  • -
  • We decided to proceed with some deletes first, then identify CGSpace-specific fields to clean/move to cg.*, and then worry about broader changes to DC
  • -
  • Before we move or rename and fields we need to circulate a list of fields we intend to change to CCAFS, CWPF, etc who might be harvesting the fields
  • -
  • After all of this we need to start implementing controlled vocabularies for fields, either with the Javascript lookup or like existing ILRI subjects
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-04/index.html b/public/2016-04/index.html deleted file mode 100644 index a052d741a..000000000 --- a/public/2016-04/index.html +++ /dev/null @@ -1,655 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - April, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

April, 2016

- -
-

2016-04-04

- -
    -
  • Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
  • -
  • We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
  • -
  • After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
  • -
  • This will save us a few gigs of backup space we’re paying for on S3
  • -
  • Also, I noticed the checker log has some errors we should pay attention to:
  • -
- -

- -
Run start time: 03/06/2016 04:00:22
-Error retrieving bitstream ID 71274 from asset store.
-java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
-        at java.io.FileInputStream.open(Native Method)
-        at java.io.FileInputStream.<init>(FileInputStream.java:146)
-        at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
-        at edu.sdsc.grid.io.GeneralFileInputStream.<init>(GeneralFileInputStream.java:145)
-        at edu.sdsc.grid.io.local.LocalFileInputStream.<init>(LocalFileInputStream.java:139)
-        at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
-        at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
-        at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
-        at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
-        at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
-        at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
-        at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:606)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
-******************************************************
-
- -
    -
  • So this would be the tomcat7 Unix user, who seems to have a default limit of 1024 files in its shell
  • -
  • For what it’s worth, we have been setting the actual Tomcat 7 process’ limit to 16384 for a few years (in /etc/default/tomcat7)
  • -
  • Looks like cron will read limits from /etc/security/limits.* so we can do something for the tomcat7 user there
  • -
  • Submit pull request for Tomcat 7 limits in Ansible dspace role (#30)
  • -
- -

2016-04-05

- -
    -
  • Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don’t need!
  • -
- -
# s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt
-# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
-# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
-# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
-# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
-
- -
    -
  • Also, adjust the cron jobs for backups so they only backup dspace.log and some stats files (.dat)
  • -
  • Try to do some metadata field migrations using the Atmire batch UI (dc.Species → cg.species) but it took several hours and even missed a few records
  • -
- -

2016-04-06

- -
    -
  • A better way to move metadata on this scale is via SQL, for example dc.type.output → dc.type (their IDs in the metadatafieldregistry are 66 and 109, respectively):
  • -
- -
dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
-UPDATE 40852
-
- -
    -
  • After that an index-discovery -bf is required
  • -
  • Start working on metadata migrations, add 25 or so new metadata fields to CGSpace
  • -
- -

2016-04-07

- - - -
$ ./migrate-fields.sh
-UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
-UPDATE 40883
-UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
-UPDATE 21420
-UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
-UPDATE 51258
-
- -

2016-04-08

- -
    -
  • Discuss metadata renaming with Abenet, we decided it’s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF
  • -
  • I’ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change
  • -
- -

2016-04-10

- - - -
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
- count
--------
-  5638
-(1 row)
-
-dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
- count
--------
-     3
-
- -
    -
  • I will manually edit the dc.identifier.doi in 1056872509 and tweet the link, then check back in a week to see if the donut gets updated
  • -
- -

2016-04-11

- -
    -
  • The donut is already updated and shows the correct number now
  • -
  • CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we’d do it tentatively on Monday the 18th.
  • -
- -

2016-04-12

- -
    -
  • Looking at quality of WLE data (cg.subject.iwmi) in SQL:
  • -
- -
dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
-
- -
    -
  • Listings and Reports is still not returning reliable data for dc.type
  • -
  • I think we need to ask Atmire, as their documentation isn’t too clear on the format of the filter configs
  • -
  • Alternatively, I want to see if I move all the data from dc.type.output to dc.type and then re-index, if it behaves better
  • -
  • Looking at our input-forms.xml I see we have two sets of ILRI subjects, but one has a few extra subjects
  • -
  • Remove one set of ILRI subjects and remove duplicate VALUE CHAINS from existing list (#216)
  • -
  • I decided to keep the set of subjects that had FMD and RANGELANDS added, as it appears to have been requested to have been added, and might be the newer list
  • -
  • I found 226 blank metadatavalues:
  • -
- -
dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
-
- -
    -
  • I think we should delete them and do a full re-index:
  • -
- -
dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
-DELETE 226
-
- -
    -
  • I deleted them on CGSpace but I’ll wait to do the re-index as we’re going to be doing one in a few days for the metadata changes anyways
  • -
  • In other news, moving the dc.type.output to dc.type and re-indexing seems to have fixed the Listings and Reports issue from above
  • -
  • Unfortunately this isn’t a very good solution, because Listings and Reports config should allow us to filter on dc.type.* but the documentation isn’t very clear and I couldn’t reach Atmire today
  • -
  • We want to do the dc.type.output move on CGSpace anyways, but we should wait as it might affect other external people!
  • -
- -

2016-04-14

- -
    -
  • Communicate with Macaroni Bros again about dc.type
  • -
  • Help Sisay with some rsync and Linux stuff
  • -
  • Notify CIAT people of metadata changes (I had forgotten them last week)
  • -
- -

2016-04-15

- -
    -
  • DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code
  • -
- -

2016-04-18

- -
    -
  • Talk to CIAT people about their portal again
  • -
  • Start looking more at the fields we want to delete
  • -
  • The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc: - -
      -
    • dc.description.abstractother
    • -
    • dc.whatwasknown
    • -
    • dc.whatisnew
    • -
    • dc.description.nationalpartners
    • -
    • dc.peerreviewprocess
    • -
    • cg.species.animal
    • -
  • -
  • Deleted!
  • -
  • The following fields have some items using them and I have to decide what to do with them (delete or move): - -
      -
    • dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections
    • -
    • dc.type.journal: 11 items, mostly in ILRI collections
    • -
    • dc.publicationcategory: 1 item, in CPWF
    • -
    • dc.GRP: 2 items, CPWF
    • -
    • dc.Species.animal: 6 items, in ILRI and AnGR
    • -
    • cg.livestock.agegroup: 9 items, in ILRI collections
    • -
    • cg.livestock.function: 20 items, mostly in EADD
    • -
  • -
  • Test metadata migration on local instance again:
  • -
- -
$ ./migrate-fields.sh
-UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
-UPDATE 40885
-UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
-UPDATE 51330
-UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
-UPDATE 5986
-UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
-UPDATE 2456
-UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
-UPDATE 3872
-UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
-UPDATE 46075
-$ JAVA_OPTS="-Xms512m -Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace index-discovery -bf
-
- -
    -
  • CGSpace was down but I’m not sure why, this was in catalina.out:
  • -
- -
Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
-SEVERE: Mapped exception to response: 500 (Internal Server Error)
-javax.ws.rs.WebApplicationException
-        at org.dspace.rest.Resource.processFinally(Resource.java:163)
-        at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
-        at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:606)
-        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
-        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
-        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
-        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
-        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
-        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
-        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
-        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
-        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
-        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
-        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
-        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
-        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
-...
-
- -
    -
  • Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)
  • -
  • After restarting Tomcat a few more of these errors were logged but the application was up
  • -
- -

2016-04-19

- -
    -
  • Get handles for items that are using a given metadata field, ie dc.Species.animal (105):
  • -
- -
# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
-   handle
--------------
- 10568/10298
- 10568/16413
- 10568/16774
- 10568/34487
-
- -
    -
  • Delete metadata values for dc.GRP and dc.icsubject.icrafsubject:
  • -
- -
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
-# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
-
- -
    -
  • They are old ICRAF fields and we haven’t used them since 2011 or so
  • -
  • Also delete them from the metadata registry
  • -
  • CGSpace went down again, dspace.log had this:
  • -
- -
2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
-org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-
- -
    -
  • I restarted Tomcat and PostgreSQL and now it’s back up
  • -
  • I bet this is the same crash as yesterday, but I only saw the errors in catalina.out
  • -
  • Looks to be related to this, from dspace.log:
  • -
- -
2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
-
- -
    -
  • We have 18,000 of these errors right now…
  • -
  • Delete a few more old metadata values: dc.Species.animal, dc.type.journal, and dc.publicationcategory:
  • -
- -
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
-# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
-# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
-
- -
    -
  • And then remove them from the metadata registry
  • -
- -

2016-04-20

- -
    -
  • Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server
  • -
  • Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server
  • -
  • Field migration went well:
  • -
- -
$ ./migrate-fields.sh
-UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
-UPDATE 40909
-UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
-UPDATE 51419
-UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
-UPDATE 5986
-UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
-UPDATE 2458
-UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
-UPDATE 3872
-UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
-UPDATE 46075
-
- -
    -
  • Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well
  • -
  • Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)
  • -
  • Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:
  • -
- -
$ grep -c "Aborting context in finally statement" dspace.log.2016-04-20
-21252
-
- -
    -
  • I found a recent discussion on the DSpace mailing list and I’ve asked for advice there
  • -
  • Looks like this issue was noted and fixed in DSpace 5.5 (we’re on 5.1): https://jira.duraspace.org/browse/DS-2936
  • -
  • I’ve sent a message to Atmire asking about compatibility with DSpace 5.5
  • -
- -

2016-04-21

- -
    -
  • Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)
  • -
  • Atmire responded with DSpace 5.5 compatible versions for their modules, so I’ll start testing those in a few weeks
  • -
- -

2016-04-22

- - - -

2016-04-26

- - - -

2016-04-27

- -
    -
  • I woke up to ten or fifteen “up” and “down” emails from the monitoring website
  • -
  • Looks like the last one was “down” from about four hours ago
  • -
  • I think there must be something with this REST stuff:
  • -
- -
# grep -c "Aborting context in finally statement" dspace.log.2016-04-*
-dspace.log.2016-04-01:0
-dspace.log.2016-04-02:0
-dspace.log.2016-04-03:0
-dspace.log.2016-04-04:0
-dspace.log.2016-04-05:0
-dspace.log.2016-04-06:0
-dspace.log.2016-04-07:0
-dspace.log.2016-04-08:0
-dspace.log.2016-04-09:0
-dspace.log.2016-04-10:0
-dspace.log.2016-04-11:0
-dspace.log.2016-04-12:235
-dspace.log.2016-04-13:44
-dspace.log.2016-04-14:0
-dspace.log.2016-04-15:35
-dspace.log.2016-04-16:0
-dspace.log.2016-04-17:0
-dspace.log.2016-04-18:11942
-dspace.log.2016-04-19:28496
-dspace.log.2016-04-20:28474
-dspace.log.2016-04-21:28654
-dspace.log.2016-04-22:28763
-dspace.log.2016-04-23:28773
-dspace.log.2016-04-24:28775
-dspace.log.2016-04-25:28626
-dspace.log.2016-04-26:28655
-dspace.log.2016-04-27:7271
-
- -
    -
  • I restarted tomcat and it is back up
  • -
  • Add Spanish XMLUI strings so those users see “CGSpace” instead of “DSpace” in the user interface (#222)
  • -
  • Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: https://jira.duraspace.org/browse/DS-3172
  • -
  • Update infrastructure playbooks for nginx 1.10.x (stable) release: https://github.com/ilri/rmg-ansible-public/issues/32
  • -
  • Currently running on DSpace Test, we’ll give it a few days before we adjust CGSpace
  • -
  • CGSpace down, restarted tomcat and it’s back up
  • -
- -

2016-04-28

- -
    -
  • Problems with stability again. I’ve blocked access to /rest for now to see if the number of errors in the log files drop
  • -
  • Later we could maybe start logging access to /rest and perhaps whitelist some IPs…
  • -
- -

2016-04-30

- -
    -
  • Logs for today and yesterday have zero references to this REST error, so I’m going to open back up the REST API but log all requests
  • -
- -
location /rest {
-	access_log /var/log/nginx/rest.log;
-	proxy_pass http://127.0.0.1:8443;
-}
-
- -
    -
  • I will check the logs again in a few days to look for patterns, see who is accessing it, etc
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-05/index.html b/public/2016-05/index.html deleted file mode 100644 index 843f5b6a8..000000000 --- a/public/2016-05/index.html +++ /dev/null @@ -1,505 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - May, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

May, 2016

- -
-

2016-05-01

- -
    -
  • Since yesterday there have been 10,000 REST errors and the site has been unstable again
  • -
  • I have blocked access to the API now
  • -
  • There are 3,000 IPs accessing the REST API in a 24-hour period!
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
-3168
-
- -

- -
    -
  • The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29
  • -
  • 100% of the requests coming from Ethiopia are like this and result in an HTTP 500:
  • -
- -
GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1
-
- -
    -
  • For now I’ll block just the Ethiopian IP
  • -
  • The owner of that application has said that the NaN (not a number) is an error in his code and he’ll fix it
  • -
- -

2016-05-03

- -
    -
  • Update nginx to 1.10.x branch on CGSpace
  • -
  • Fix a reference to dc.type.output in Discovery that I had missed when we migrated to dc.type last month (#223)
  • -
- -

Item type in Discovery results

- -

2016-05-06

- -
    -
  • DSpace Test is down, catalina.out has lots of messages about heap space from some time yesterday (!)
  • -
  • It looks like Sisay was doing some batch imports
  • -
  • Hmm, also disk space is full
  • -
  • I decided to blow away the solr indexes, since they are 50GB and we don’t really need all the Atmire stuff there right now
  • -
  • I will re-generate the Discovery indexes after re-deploying
  • -
  • Testing renew-letsencrypt.sh script for nginx
  • -
- -
#!/usr/bin/env bash
-
-readonly SERVICE_BIN=/usr/sbin/service
-readonly LETSENCRYPT_BIN=/opt/letsencrypt/letsencrypt-auto
-
-# stop nginx so LE can listen on port 443
-$SERVICE_BIN nginx stop
-
-$LETSENCRYPT_BIN renew -nvv --standalone --standalone-supported-challenges tls-sni-01 > /var/log/letsencrypt/renew.log 2>&1
-
-LE_RESULT=$?
-
-$SERVICE_BIN nginx start
-
-if [[ "$LE_RESULT" != 0 ]]; then
-    echo 'Automated renewal failed:'
-
-    cat /var/log/letsencrypt/renew.log
-
-    exit 1
-fi
-
- -
    -
  • Seems to work well
  • -
- -

2016-05-10

- -
    -
  • Start looking at more metadata migrations
  • -
  • There are lots of fields in dcterms namespace that look interesting, like: - -
      -
    • dcterms.type
    • -
    • dcterms.spatial
    • -
  • -
  • Not sure what dcterms is…
  • -
  • Looks like these were added in DSpace 4 to allow for future work to make DSpace more flexible
  • -
  • CGSpace’s dc registry has 96 items, and the default DSpace one has 73.
  • -
- -

2016-05-11

- -
    -
  • Identify and propose the next phase of CGSpace fields to migrate:

    - -
      -
    • dc.title.jtitle → cg.title.journal
    • -
    • dc.identifier.status → cg.identifier.status
    • -
    • dc.river.basin → cg.river.basin
    • -
    • dc.Species → cg.species
    • -
    • dc.targetaudience → cg.targetaudience
    • -
    • dc.fulltextstatus → cg.fulltextstatus
    • -
    • dc.editon → cg.edition
    • -
    • dc.isijournal → cg.isijournal
    • -
  • - -
  • Start a test rebase of the 5_x-prod branch on top of the dspace-5.5 tag

  • - -
  • There were a handful of conflicts that I didn’t understand

  • - -
  • After completing the rebase I tried to build with the module versions Atmire had indicated as being 5.5 ready but I got this error:

  • -
- -
[ERROR] Failed to execute goal on project additions: Could not resolve dependencies for project org.dspace.modules:additions:jar:5.5: Could not find artifact com.atmire:atmire-metadata-quality-api:jar:5.5-2.10.1-0 in sonatype-releases (https://oss.sonatype.org/content/repositories/releases/) -> [Help 1]
-
- -
    -
  • I’ve sent them a question about it
  • -
  • A user mentioned having problems with uploading a 33 MB PDF
  • -
  • I told her I would increase the limit temporarily tomorrow morning
  • -
  • Turns out she was able to decrease the size of the PDF so we didn’t have to do anything
  • -
- -

2016-05-12

- -
    -
  • Looks like the issue that Abenet was having a few days ago with “Connection Reset” in Firefox might be due to a Firefox 46 issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1268775
  • -
  • I finally found a copy of the latest CG Core metadata guidelines and it looks like we can add a few more fields to our next migration: - -
      -
    • dc.rplace.region → cg.coverage.region
    • -
    • dc.cplace.country → cg.coverage.country
    • -
  • -
  • Questions for CG people: - -
      -
    • Our dc.place and dc.srplace.subregion could both map to cg.coverage.admin-unit?
    • -
    • Should we use dc.contributor.crp or cg.contributor.crp for the CRP (ours is dc.crsubject.crpsubject)?
    • -
    • Our dc.contributor.affiliation and dc.contributor.corporate could both map to dc.contributor and possibly dc.contributor.center depending on if it’s a CG center or not
    • -
    • dc.title.jtitle could either map to dc.publisher or dc.source depending on how you read things
    • -
  • -
  • Found ~200 messed up CIAT values in dc.publisher:
  • -
- -
# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to "%  %";
-
- -

2016-05-13

- -
    -
  • More theorizing about CGcore
  • -
  • Add two new fields: - -
      -
    • dc.srplace.subregion → cg.coverage.admin-unit
    • -
    • dc.place → cg.place
    • -
  • -
  • dc.place is our own field, so it’s easy to move
  • -
  • I’ve removed dc.title.jtitle from the list for now because there’s no use moving it out of DC until we know where it will go (see discussion yesterday)
  • -
- -

2016-05-18

- -
    -
  • Work on 707 CCAFS records
  • -
  • They have thumbnails on Flickr and elsewhere
  • -
  • In OpenRefine I created a new filename column based on the thumbnail column with the following GREL:
  • -
- -
if(cells['thumbnails'].value.contains('hqdefault'), cells['thumbnails'].value.split('/')[-2] + '.jpg', cells['thumbnails'].value.split('/')[-1])
-
- -
    -
  • Because ~400 records had the same filename on Flickr (hqdefault.jpg) but different UUIDs in the URL
  • -
  • So for the hqdefault.jpg ones I just take the UUID (-2) and use it as the filename
  • -
  • Before importing with SAFBuilder I tested adding “__bundle:THUMBNAIL” to the filename column and it works fine
  • -
- -

2016-05-19

- -
    -
  • More quality control on filename field of CCAFS records to make processing in shell and SAFBuilder more reliable:
  • -
- -
value.replace('_','').replace('-','')
-
- -
    -
  • We need to hold off on moving dc.Species to cg.species because it is only used for plants, and might be better to move it to something like cg.species.plant
  • -
  • And dc.identifier.fund is MOSTLY used for CPWF project identifier but has some other sponsorship things - -
      -
    • We should move PN, SG, CBA, IA, and PHASE* values to cg.identifier.cpwfproject
    • -
    • The rest, like BMGF and USAID etc, might have to go to either dc.description.sponsorship or cg.identifier.fund (not sure yet)
    • -
    • There are also some mistakes in CPWF’s things, like “PN 47”
    • -
    • This ought to catch all the CPWF values (there don’t appear to be and SG* values):
    • -
  • -
- -
# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
-
- -

2016-05-20

- -
    -
  • More work on CCAFS Video and Images records
  • -
  • For SAFBuilder we need to modify filename column to have the thumbnail bundle: -
  • -
- -
value + "__bundle:THUMBNAIL"
-
- -
    -
  • Also, I fixed some weird characters using OpenRefine’s transform with the following GREL:
  • -
- -
value.replace(/\u0081/,'')
-
- -
    -
  • Write shell script to resize thumbnails with height larger than 400: https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256
  • -
  • Upload 707 CCAFS records to DSpace Test
  • -
  • A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target _black): #224
  • -
  • Work on configuration changes for Phase 2 metadata migrations
  • -
- -

2016-05-23

- -
    -
  • Try to import the CCAFS Images and Videos to CGSpace but had some issues with LibreOffice and OpenRefine
  • -
  • LibreOffice excludes empty cells when it exports and all the fields shift over to the left and cause URLs to go to Subjects, etc.
  • -
  • Google Docs does this better, but somehow reorders the rows and when I paste the thumbnail/filename row in they don’t match!
  • -
  • I will have to try later
  • -
- -

2016-05-30

- -
    -
  • Export CCAFS video and image records from DSpace Test using the migrate option (-m):
  • -
- -
$ mkdir ~/ccafs-images
-$ /home/dspacetest.cgiar.org/bin/dspace export -t COLLECTION -i 10568/79355 -d ~/ccafs-images -n 0 -m
-
- -
    -
  • And then import to CGSpace:
  • -
- -
$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/70974 --source /tmp/ccafs-images --mapfile=/tmp/ccafs-images-may30.map &> /tmp/ccafs-images-may30.log
-
- -
    -
  • But now we have double authors for “CGIAR Research Program on Climate Change, Agriculture and Food Security” in the authority
  • -
  • I’m trying to do a Discovery index before messing with the authority index
  • -
  • Looks like we are missing the index-authority cron job, so who knows what’s up with our authority index
  • -
  • Run system updates on DSpace Test, re-deploy code, and reboot the server
  • -
  • Clean up and import ~200 CTA records to CGSpace via CSV like:
  • -
- -
$ export JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8"
-$ /home/cgspace.cgiar.org/bin/dspace metadata-import -e aorth@mjanja.ch -f ~/CTA-May30/CTA-42229.csv &> ~/CTA-May30/CTA-42229.log
-
- -
    -
  • Discovery indexing took a few hours for some reason, and after that I started the index-authority script
  • -
- -
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace index-authority
-
- -

2016-05-31

- -
    -
  • The index-authority script ran over night and was finished in the morning
  • -
  • Hopefully this was because we haven’t been running it regularly and it will speed up next time
  • -
  • I am running it again with a timer to see:
  • -
- -
$ time /home/cgspace.cgiar.org/bin/dspace index-authority
-Retrieving all data
-Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
-Cleaning the old index
-Writing new data
-All done !
-
-real    37m26.538s
-user    2m24.627s
-sys     0m20.540s
-
- -
    -
  • Update tomcat7 crontab on CGSpace and DSpace Test to have the index-authority script that we were missing
  • -
  • Add new ILRI subject and CCAFS project tags to input-forms.xml (#226, #225)
  • -
  • Manually mapped the authors of a few old CCAFS records to the new CCAFS authority UUID and re-indexed authority indexes to see if it helps correct those items.
  • -
  • Re-sync DSpace Test data with CGSpace
  • -
  • Clean up and import ~65 more CTA items into CGSpace
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-06/index.html b/public/2016-06/index.html deleted file mode 100644 index 194ae9c6d..000000000 --- a/public/2016-06/index.html +++ /dev/null @@ -1,550 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - June, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

June, 2016

- -
-

2016-06-01

- - - -

- -
dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
-UPDATE 497
-dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75;
-UPDATE 14
-
- -
    -
  • Fix a few minor miscellaneous issues in dspace.cfg (#227)
  • -
- -

2016-06-02

- -
    -
  • Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit
  • -
  • Seems that the Browse configuration in dspace.cfg can’t handle the ‘-’ in the field name:
  • -
- -
webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
-
- -
    -
  • But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error
  • -
  • I’ve sent a message to the DSpace mailing list to ask about the Browse index definition
  • -
  • A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue
  • -
  • I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740
  • -
  • The patch applies successfully on DSpace 5.1 so I will try it later
  • -
- -

2016-06-03

- -
    -
  • Investigating the CCAFS authority issue, I exported the metadata for the Videos collection
  • -
  • The top two authors are:
  • -
- -
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500
-CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600
-
- -
    -
  • So the only difference is the “confidence”
  • -
  • Ok, well THAT is interesting:
  • -
- -
dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %';
- text_value |              authority               | confidence
-------------+--------------------------------------+------------
- Orth, A.   | ab606e3a-2b04-4c7d-9423-14beccf54257 |         -1
- Orth, A.   | ab606e3a-2b04-4c7d-9423-14beccf54257 |         -1
- Orth, A.   | ab606e3a-2b04-4c7d-9423-14beccf54257 |         -1
- Orth, Alan |                                      |         -1
- Orth, Alan |                                      |         -1
- Orth, Alan |                                      |         -1
- Orth, Alan |                                      |         -1
- Orth, A.   | 05c2c622-d252-4efb-b9ed-95a07d3adf11 |         -1
- Orth, A.   | 05c2c622-d252-4efb-b9ed-95a07d3adf11 |         -1
- Orth, A.   | ab606e3a-2b04-4c7d-9423-14beccf54257 |         -1
- Orth, A.   | ab606e3a-2b04-4c7d-9423-14beccf54257 |         -1
- Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 |        600
- Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 |        600
-(13 rows)
-
- -
    -
  • And now an actually relevent example:
  • -
- -
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500;
- count
--------
-   707
-(1 row)
-
-dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500;
- count
--------
-   253
-(1 row)
-
- -
    -
  • Trying something experimental:
  • -
- -
dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
-UPDATE 960
-
- -
    -
  • And then re-indexing authority and Discovery…?
  • -
  • After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet
  • -
  • The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:
  • -
- -
webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
-
- -
    -
  • That would only be for the “Browse by” function… so we’ll have to see what effect that has later
  • -
- -

2016-06-04

- -
    -
  • Re-sync DSpace Test with CGSpace and perform test of metadata migration again
  • -
  • Run phase two of metadata migrations on CGSpace (see the migration notes)
  • -
  • Run all system updates and reboot CGSpace server
  • -
- -

2016-06-07

- -
    -
  • Figured out how to export a list of the unique values from a metadata field ordered by count:
  • -
- -
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
-
- -
    -
  • Identified the next round of fields to migrate:

    - -
      -
    • dc.title.jtitle → dc.source
    • -
    • dc.crsubject.crpsubject → cg.contributor.crp
    • -
    • dc.contributor.affiliation → cg.contributor.affiliation
    • -
    • dc.Species → cg.species
    • -
    • dc.contributor.corporate → dc.contributor
    • -
    • dc.identifier.url → cg.identifier.url
    • -
    • dc.identifier.doi → cg.identifier.doi
    • -
    • dc.identifier.googleurl → cg.identifier.googleurl
    • -
    • dc.identifier.dataurl → cg.identifier.dataurl
    • -
  • - -
  • Discuss pulling data from IFPRI’s ContentDM with Ryan Miller

  • - -
  • Looks like OAI is kinda obtuse for this, and if we use ContentDM’s API we’ll be able to access their internal field names (rather than trying to figure out how they stuffed them into various, repeated Dublin Core fields)

  • -
- -

2016-06-08

- - - -
$ xml sel -t -m '//value-pairs[@value-pairs-name="ilrisubject"]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
-
- -
    -
  • Write to Atmire about the use of atmire.orcid.id to see if we can change it
  • -
  • Seems to be a virtual field that is queried from the authority cache… hmm
  • -
  • In other news, I found out that the About page that we haven’t been using lives in dspace/config/about.xml, so now we can update the text
  • -
  • File bug about closed="true" attribute of controlled vocabularies not working: https://jira.duraspace.org/browse/DS-3238
  • -
- -

2016-06-09

- -
    -
  • Atmire explained that the atmire.orcid.id field doesn’t exist in the schema, as it actually comes from the authority cache during XMLUI run time
  • -
  • This means we don’t see it when harvesting via OAI or REST, for example
  • -
  • They opened a feature ticket on the DSpace tracker to ask for support of this: https://jira.duraspace.org/browse/DS-3239
  • -
- -

2016-06-10

- -
    -
  • Investigating authority confidences
  • -
  • It looks like the values are documented in Choices.java
  • -
  • Experiment with setting all 960 CCAFS author values to be 500:
  • -
- -
dspacetest=# SELECT authority, confidence FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
-
-dspacetest=# UPDATE metadatavalue set confidence = 500 where resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
-UPDATE 960
-
- -
    -
  • After the database edit, I did a full Discovery re-index
  • -
  • And now there are exactly 960 items in the authors facet for ‘CGIAR Research Program on Climate Change, Agriculture and Food Security’
  • -
  • Now I ran the same on CGSpace
  • -
  • Merge controlled vocabulary functionality for animal breeds to 5_x-prod (#236)
  • -
  • Write python script to update metadata values in batch via PostgreSQL: fix-metadata-values.py
  • -
  • We need to use this to correct some pretty ugly values in fields like dc.description.sponsorship
  • -
  • Merge item display tweaks from earlier this week (#231)
  • -
  • Merge controlled vocabulary functionality for subregions (#238)
  • -
- -

2016-06-11

- -
    -
  • Merge controlled vocabulary for sponsorship field (#239)
  • -
  • Fix character encoding issues for animal breed lookup that I merged yesterday
  • -
- -

2016-06-17

- -
    -
  • Linode has free RAM upgrades for their 13th birthday so I migrated DSpace Test (4→8GB of RAM)
  • -
- -

2016-06-18

- -
    -
  • Clean up titles and hints in input-forms.xml to use title/sentence case and a few more consistency things (#241)
  • - -
  • The final list of fields to migrate in the third phase of metadata migrations is:

    - -
      -
    • dc.title.jtitle → dc.source
    • -
    • dc.crsubject.crpsubject → cg.contributor.crp
    • -
    • dc.contributor.affiliation → cg.contributor.affiliation
    • -
    • dc.srplace.subregion → cg.coverage.subregion
    • -
    • dc.Species → cg.species
    • -
    • dc.contributor.corporate → dc.contributor
    • -
    • dc.identifier.url → cg.identifier.url
    • -
    • dc.identifier.doi → cg.identifier.doi
    • -
    • dc.identifier.googleurl → cg.identifier.googleurl
    • -
    • dc.identifier.dataurl → cg.identifier.dataurl
    • -
  • - -
  • Interesting “Sunburst” visualization on a Digital Commons page: http://www.repository.law.indiana.edu/sunburst.html

  • - -
  • Final testing on metadata fix/delete for dc.description.sponsorship cleanup

  • - -
  • Need to run fix-metadata-values.py and then fix-metadata-values.py

  • -
- -

2016-06-20

- -
    -
  • CGSpace’s HTTPS certificate expired last night and I didn’t notice, had to renew:
  • -
- -
# /opt/letsencrypt/letsencrypt-auto renew --standalone --pre-hook "/usr/bin/service nginx stop" --post-hook "/usr/bin/service nginx start"
-
- -
    -
  • I really need to fix that cron job…
  • -
- -

2016-06-24

- -
    -
  • Run the replacements/deletes for dc.description.sponsorship (investors) on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i investors-not-blank-not-delete-85.csv -f dc.description.sponsorship -t 'correct investor' -m 29 -d cgspace -p 'fuuu' -u cgspace
-$ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.sponsorship -m 29 -d cgspace -p 'fuuu' -u cgspace
-
- - - -

2016-06-28

- -
    -
  • Testing the cleanup of dc.contributor.corporate with 13 deletions and 121 replacements
  • -
  • There are still ~97 fields that weren’t indicated to do anything
  • -
  • After the above deletions and replacements I regenerated a CSV and sent it to Peter et al to have a look
  • -
- -
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=126 group by text_value order by count desc) to /tmp/contributors-june28.csv with csv;
-
- -
    -
  • Re-evaluate dc.contributor.corporate and it seems we will move it to dc.contributor.author as this is more in line with how editors are actually using it
  • -
- -

2016-06-29

- -
    -
  • Test run of migrate-fields.sh with the following re-mappings:
  • -
- -
72  55  #dc.source
-86  230 #cg.contributor.crp
-91  211 #cg.contributor.affiliation
-94  212 #cg.species
-107 231 #cg.coverage.subregion
-126 3   #dc.contributor.author
-73  219 #cg.identifier.url
-74  220 #cg.identifier.doi
-79  222 #cg.identifier.googleurl
-89  223 #cg.identifier.dataurl
-
- -
    -
  • Run all cleanups and deletions of dc.contributor.corporate on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i Corporate-Authors-Fix-121.csv -f dc.contributor.corporate -t 'Correct style' -m 126 -d cgspace -u cgspace -p 'fuuu'
-$ ./fix-metadata-values.py -i Corporate-Authors-Fix-PB.csv -f dc.contributor.corporate -t 'should be' -m 126 -d cgspace -u cgspace -p 'fuuu'
-$ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-Delete-13.csv -m 126 -u cgspace -d cgspace -p 'fuuu'
-
- -
    -
  • Re-deploy CGSpace and DSpace Test with latest June changes
  • -
  • Now the sharing and Altmetric bits are more prominent:
  • -
- -

DSpace 5.1 XMLUI With Altmetric Badge

- -
    -
  • Run all system updates on the servers and reboot
  • -
  • Start working on config changes for phase three of the metadata migrations
  • -
- -

2016-06-30

- -
    -
  • Wow, there are 95 authors in the database who have ‘,’ at the end of their name:
  • -
- -
# select text_value from  metadatavalue where metadata_field_id=3 and text_value like '%,';
-
- -
    -
  • We need to use something like this to fix them, need to write a proper regex later:
  • -
- -
# update metadatavalue set text_value = regexp_replace(text_value, '(Poole, J),', '\1') where metadata_field_id=3 and text_value = 'Poole, J,';
-
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-07/index.html b/public/2016-07/index.html deleted file mode 100644 index 9efa3dfc2..000000000 --- a/public/2016-07/index.html +++ /dev/null @@ -1,468 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - July, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

July, 2016

- -
-

2016-07-01

- -
    -
  • Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
  • -
  • I think this query should find and replace all authors that have “,” at the end of their names:
  • -
- -
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-
- -
    -
  • In this case the select query was showing 95 results before the update
  • -
- -

- -

2016-07-02

- -
    -
  • Comment on DSpace Jira ticket about author lookup search text (DS-2329)
  • -
- -

2016-07-04

- -
    -
  • Seems the database’s author authority values mean nothing without the authority Solr core from the host where they were created!
  • -
- -

2016-07-05

- -
    -
  • Amend backup-solr.sh script so it backs up the entire Solr folder
  • -
  • We really only need statistics and authority but meh
  • -
  • Fix metadata for species on DSpace Test:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
-
- -
    -
  • Will run later on CGSpace
  • -
  • A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is “ungraded”
  • -
  • I tested the patch for DS-2740 that I had found last month and it seems to work
  • -
  • I will merge it to 5_x-prod
  • -
- -

2016-07-06

- -
    -
  • Delete 23 blank metadata values from CGSpace:
  • -
- -
cgspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
-DELETE 23
-
- -
    -
  • Complete phase three of metadata migration, for the following fields: - -
      -
    • dc.title.jtitle → dc.source
    • -
    • dc.crsubject.crpsubject → cg.contributor.crp
    • -
    • dc.contributor.affiliation → cg.contributor.affiliation
    • -
    • dc.Species → cg.species
    • -
    • dc.srplace.subregion → cg.coverage.subregion
    • -
    • dc.contributor.corporate → dc.contributor.author
    • -
    • dc.identifier.url → cg.identifier.url
    • -
    • dc.identifier.doi → cg.identifier.doi
    • -
    • dc.identifier.googleurl → cg.identifier.googleurl
    • -
    • dc.identifier.dataurl → cg.identifier.dataurl
    • -
  • -
  • Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)
  • -
- -
$ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu'
-$ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu'
-$ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu'
-
- -
    -
  • I then ran all server updates and rebooted the server
  • -
- -

2016-07-11

- -
    -
  • Doing some author cleanups from Peter and Abenet:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
-$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
-
- -

2016-07-13

- -
    -
  • Run the author cleanups on CGSpace and start a full Discovery re-index
  • -
- -

2016-07-14

- -
    -
  • Test LDAP settings for new root LDAP
  • -
  • Seems to work when binding as a top-level user
  • -
- -

2016-07-18

- -
    -
  • Adjust identifiers in XMLUI item display to be more prominent
  • -
  • Add species and breed to the XMLUI item display
  • -
  • CGSpace crashed late at night and the DSpace logs were showing:
  • -
- -
2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - 
-org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-...
-
- -
    -
  • I suspect it’s someone hitting REST too much:
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log  | sort -n | uniq -c | sort -h | tail -n 3
-    710 66.249.78.38
-   1781 181.118.144.29
-  24904 70.32.99.142
-
- -
    -
  • I just blocked access to /rest for that last IP for now:
  • -
- -
     # log rest requests
-     location /rest {
-         access_log /var/log/nginx/rest.log;
-         proxy_pass http://127.0.0.1:8443;
-         deny 70.32.99.142;
-     }
-
- -

2016-07-21

- - - -

2016-07-22

- -
    -
  • Help Paola from CCAFS with thumbnails for batch uploads
  • -
  • She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc
  • -
  • Altmetric reports having an issue with some of our authors being doubled…
  • -
  • This is related to authority and confidence!
  • -
  • We might need to use index.authority.ignore-prefered=true to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.
  • -
  • Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:
  • -
- -
index.authority.ignore-prefered.dc.contributor.author=true
-index.authority.ignore-variants.dc.contributor.author=false
-
- -
    -
  • After reindexing I don’t see any change in Discovery’s display of authors, and still have entries like:
  • -
- -
Grace, D. (464)
-Grace, D. (62)
-
- -
    -
  • I asked for clarification of the following options on the DSpace mailing list:
  • -
- -
index.authority.ignore
-index.authority.ignore-prefered
-index.authority.ignore-variants
-
- -
    -
  • In the mean time, I will try these on DSpace Test (plus a reindex):
  • -
- -
index.authority.ignore=true
-index.authority.ignore-prefered=true
-index.authority.ignore-variants=true
-
- -
    -
  • Enabled usage of X-Forwarded-For in DSpace admin control panel (#255
  • -
  • It was misconfigured and disabled, but already working for some reason sigh
  • -
  • … no luck. Trying with just:
  • -
- -
index.authority.ignore=true
-
- -
    -
  • After re-indexing and clearing the XMLUI cache nothing has changed
  • -
- -

2016-07-25

- -
    -
  • Trying a few more settings (plus reindex) for Discovery on DSpace Test:
  • -
- -
index.authority.ignore-prefered.dc.contributor.author=true
-index.authority.ignore-variants=true
-
- -
    -
  • Run all OS updates and reboot DSpace Test server
  • -
  • No changes to Discovery after reindexing… hmm.
  • -
  • Integrate and massively clean up About page (#256)
  • -
- -

About page

- -
    -
  • The DSpace source code mentions the configuration key discovery.index.authority.ignore-prefered.* (with prefix of discovery, despite the docs saying otherwise), so I’m trying the following on DSpace Test:
  • -
- -
discovery.index.authority.ignore-prefered.dc.contributor.author=true
-discovery.index.authority.ignore-variants=true
-
- -
    -
  • Still no change!
  • -
  • Deploy species, breed, and identifier changes to CGSpace, as well as About page
  • -
  • Run Linode RAM upgrade (8→12GB)
  • -
  • Re-sync DSpace Test with CGSpace
  • -
  • I noticed that our backup scripts don’t send Solr cores to S3 so I amended the script
  • -
- -

2016-07-31

- -
    -
  • Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by
  • -
  • Also change “Subjects” to “AGROVOC keywords” in Discovery sidebar/search and Browse by (#257)
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-08/index.html b/public/2016-08/index.html deleted file mode 100644 index b2b92202c..000000000 --- a/public/2016-08/index.html +++ /dev/null @@ -1,541 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - August, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

August, 2016

- -
-

2016-08-01

- -
    -
  • Add updated distribution license from Sisay (#259)
  • -
  • Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
  • -
  • Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
  • -
  • bower stuff is a dead end, waste of time, too many issues
  • -
  • Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
  • -
  • Start working on DSpace 5.1 → 5.5 port:
  • -
- -
$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-
- -

- -
    -
  • Lots of conflicts that don’t make sense (ie, shouldn’t conflict!)
  • -
  • This file in particular conflicts almost 10 times: dspace/modules/xmlui-mirage2/src/main/webapp/themes/CGIAR/styles/_style.scss
  • -
  • Checking out a clean branch at 5.5 and cherry-picking our commits works where that file would normally have a conflict
  • -
  • Seems to be related to merge commits
  • -
  • git rebase --preserve-merges doesn’t seem to help
  • -
  • Eventually I just turned on git rerere and solved the conflicts and completed the 403 commit rebase
  • -
  • The 5.5 code now builds but doesn’t run (white page in Tomcat)
  • -
- -

2016-08-02

- -
    -
  • Ask Atmire for help with DSpace 5.5 issue
  • -
  • Vanilla DSpace 5.5 deploys and runs fine
  • -
  • Playing with DSpace in Ubuntu 16.04 and Tomcat 7
  • -
  • Everything is still fucked up, even vanilla DSpace 5.5
  • -
- -

2016-08-04

- -
    -
  • Ask on DSpace mailing list about duplicate authors, Discovery and author text values
  • -
  • Atmire responded with some new DSpace 5.5 ready versions to try for their modules
  • -
- -

2016-08-05

- -
    -
  • Fix item display incorrectly displaying Species when Breeds were present (#260)
  • -
  • Experiment with fixing more authors, like Delia Grace:
  • -
- -
dspacetest=# update metadatavalue set authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where metadata_field_id=3 and text_value='Grace, D.';
-
- -

2016-08-06

- -
    -
  • Finally figured out how to remove “View/Open” and “Bitstreams” from the item view
  • -
- -

2016-08-07

- -
    -
  • Start working on Ubuntu 16.04 Ansible playbook for Tomcat 8, PostgreSQL 9.5, Oracle 8, etc
  • -
- -

2016-08-08

- -
    -
  • Still troubleshooting Atmire modules on DSpace 5.5
  • -
  • Vanilla DSpace 5.5 works on Tomcat 7…
  • -
  • Ooh, and vanilla DSpace 5.5 works on Tomcat 8 with Java 8!
  • -
  • Some notes about setting up Tomcat 8, since it’s new on this machine…
  • -
  • Install latest Oracle Java 8 JDK
  • -
  • Create setenv.sh in Tomcat 8 libexec/bin directory: -
  • -
- -
CATALINA_OPTS="-Djava.awt.headless=true -Xms3072m -Xmx3072m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dfile.encoding=UTF-8"
-CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/opt/brew/Cellar/tomcat-native/1.2.8/lib"
-
-JRE_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home
-
- -
    -
  • Edit Tomcat 8 server.xml to add regular HTTP listener for solr
  • -
  • Symlink webapps:
  • -
- -
$ rm -rf /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT
-$ ln -sv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT
-$ ln -sv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/oai
-$ ln -sv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/jspui
-$ ln -sv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/rest
-$ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/solr
-
- -

2016-08-09

- - - -

2016-08-10

- -
    -
  • Turns out DSpace 5.x isn’t ready for Tomcat 8: https://jira.duraspace.org/browse/DS-3092
  • -
  • So we’ll need to use Tomcat 7 + Java 8 on Ubuntu 16.04
  • -
  • More work on the Ansible stuff for this, allowing Tomcat 7 to use Java 8
  • -
  • Merge pull request for fixing the type Discovery index to use dc.type (#262)
  • -
  • Merge pull request for removing “Bitstream” text from item display, as it confuses users and isn’t necessary (#263)
  • -
- -

2016-08-11

- -
    -
  • Finally got DSpace (5.5) running on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5 via the updated Ansible stuff
  • -
- -

DSpace 5.5 on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5

- -

2016-08-14

- - - -

2016-08-15

- - - -

ExpressJS running behind nginx

- -

2016-08-16

- -
    -
  • Troubleshoot Paramiko connection issues with Ansible on ILRI servers: #37
  • -
  • Turns out we need to add some MACs to our sshd_config: hmac-sha2-512,hmac-sha2-256
  • -
  • Update DSpace Test’s Java to version 8 to start testing this configuration (seeing as Solr recommends it)
  • -
- -

2016-08-17

- -
    -
  • More work on Let’s Encrypt stuff for Ansible roles
  • -
  • Yesterday Atmire responded about DSpace 5.5 issues and asked me to try the dspace database repair command to fix Flyway issues
  • -
  • The dspace database command doesn’t even run: https://gist.github.com/alanorth/c43c8d89e8df346d32c0ee938be90cd5
  • -
  • Oops, it looks like the missing classes causing dspace database to fail were coming from the old ~/dspace/config/spring folder
  • -
  • After removing the spring folder and running ant install again, dspace database works
  • -
  • I see there are missing and pending Flyway migrations, but running dspace database repair and dspace database migrate does nothing: https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70
  • -
- -

2016-08-18

- -
    -
  • Fix “CONGO,DR” country name in input-forms.xml (#264)
  • -
  • Also need to fix existing records using the incorrect form in the database:
  • -
- -
dspace=# update metadatavalue set text_value='CONGO, DR' where resource_type_id=2 and metadata_field_id=228 and text_value='CONGO,DR';
-
- -
    -
  • I asked a question on the DSpace mailing list about updating “preferred” forms of author names from ORCID
  • -
- -

2016-08-21

- -
    -
  • A few days ago someone on the DSpace mailing list suggested I try dspace dsrun org.dspace.authority.UpdateAuthorities to update preferred author names from ORCID
  • -
  • If you set auto-update-items=true in dspace/config/modules/solrauthority.cfg it is supposed to update records it finds automatically
  • -
  • I updated my name format on ORCID and I’ve been running that script a few times per day since then but nothing has changed
  • -
  • Still troubleshooting Atmire modules on DSpace 5.5
  • -
  • I sent them some new verbose logs: https://gist.github.com/alanorth/700748995649688148ceba89d760253e
  • -
- -

2016-08-22

- -
    -
  • Database migrations are fine on DSpace 5.1:
  • -
- -
$ ~/dspace/bin/dspace database info
-
-Database URL: jdbc:postgresql://localhost:5432/dspacetest
-Database Schema: public
-Database Software: PostgreSQL version 9.3.14
-Database Driver: PostgreSQL Native Driver version PostgreSQL 9.1 JDBC4 (build 901)
-
-+----------------+----------------------------+---------------------+---------+
-| Version        | Description                | Installed on        | State   |
-+----------------+----------------------------+---------------------+---------+
-| 1.1            | Initial DSpace 1.1 databas |                     | PreInit |
-| 1.2            | Upgrade to DSpace 1.2 sche |                     | PreInit |
-| 1.3            | Upgrade to DSpace 1.3 sche |                     | PreInit |
-| 1.3.9          | Drop constraint for DSpace |                     | PreInit |
-| 1.4            | Upgrade to DSpace 1.4 sche |                     | PreInit |
-| 1.5            | Upgrade to DSpace 1.5 sche |                     | PreInit |
-| 1.5.9          | Drop constraint for DSpace |                     | PreInit |
-| 1.6            | Upgrade to DSpace 1.6 sche |                     | PreInit |
-| 1.7            | Upgrade to DSpace 1.7 sche |                     | PreInit |
-| 1.8            | Upgrade to DSpace 1.8 sche |                     | PreInit |
-| 3.0            | Upgrade to DSpace 3.x sche |                     | PreInit |
-| 4.0            | Initializing from DSpace 4 | 2015-11-20 12:42:52 | Success |
-| 5.0.2014.08.08 | DS-1945 Helpdesk Request a | 2015-11-20 12:42:53 | Success |
-| 5.0.2014.09.25 | DS 1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
-| 5.0.2014.09.26 | DS-1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
-| 5.0.2015.01.27 | MigrateAtmireExtraMetadata | 2015-11-20 12:43:29 | Success |
-| 5.1.2015.12.03 | Atmire CUA 4 migration     | 2016-03-21 17:10:41 | Success |
-| 5.1.2015.12.03 | Atmire MQM migration       | 2016-03-21 17:10:42 | Success |
-+----------------+----------------------------+---------------------+---------+
-
- -
    -
  • So I’m not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as “Missing”)
  • -
- -

2016-08-23

- -
    -
  • Help Paola from CCAFS with her thumbnails again
  • -
  • Talk to Atmire about the DSpace 5.5 issue, and it seems to be caused by a bug in FlywayDB
  • -
  • They said I should delete the Atmire migrations -
  • -
- -
dspacetest=# delete from schema_version where description =  'Atmire CUA 4 migration' and version='5.1.2015.12.03.2';
-dspacetest=# delete from schema_version where description =  'Atmire MQM migration' and version='5.1.2015.12.03.3';
-
- -
    -
  • After that DSpace starts up by XMLUI now has unrelated issues that I need to solve!
  • -
- -
org.apache.avalon.framework.configuration.ConfigurationException: Type 'ThemeResourceReader' does not exist for 'map:read' at jndi:/localhost/themes/0_CGIAR/sitemap.xmap:136:77
-context:/jndi:/localhost/themes/0_CGIAR/sitemap.xmap - 136:77
-
- -
    -
  • Looks like we’re missing some stuff in the XMLUI module’s sitemap.xmap, as well as in each of our XMLUI themes
  • -
  • Diff them with these to get the ThemeResourceReader changes: - -
      -
    • dspace-xmlui/src/main/webapp/sitemap.xmap
    • -
    • dspace-xmlui-mirage2/src/main/webapp/sitemap.xmap
    • -
  • -
  • Then we had some NullPointerException from the SolrLogger class, which is apparently part of Atmire’s CUA module
  • -
  • I tried with a small version bump to CUA but it didn’t work (version 5.5-4.1.1-0)
  • -
  • Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode’s kernels don’t enable them
  • -
- -

2016-08-24

- -
    -
  • Clean up and import 48 CCAFS records into DSpace Test
  • -
  • SQL to get all journal titles from dc.source (55), since it’s apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:
  • -
- -
dspacetest=# select distinct text_value from metadatavalue where metadata_field_id=55 and text_value !~ '.*(\.pdf|\.png|\.PDF|\.Pdf|\.JPEG|\.jpg|\.JPG|\.jpeg|\.xls|\.rtf|\.docx?|\.potx|\.dotx|\.eqa|\.tiff|\.mp4|\.mp3|\.gif|\.zip|\.txt|\.pptx|\.indd|\.PNG|\.bmp|\.exe|org\.dspace\.app\.mediafilter).*';
-
- -

2016-08-25

- -
    -
  • Atmire suggested adding a missing bean to dspace/config/spring/api/atmire-cua.xml but it doesn’t help:
  • -
- -
...
-Error creating bean with name 'MetadataStorageInfoService'
-...
-
- -
    -
  • Atmire sent an updated version of dspace/config/spring/api/atmire-cua.xml and now XMLUI starts but gives a null pointer exception:
  • -
- -
Java stacktrace: java.lang.NullPointerException
-    at org.dspace.app.xmlui.aspect.statistics.Navigation.addOptions(Navigation.java:129)
-    at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:228)
-    at sun.reflect.GeneratedMethodAccessor126.invoke(Unknown Source)
-    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-    at java.lang.reflect.Method.invoke(Method.java:606)
-    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
-    at com.sun.proxy.$Proxy103.startElement(Unknown Source)
-    at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
-    at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
-    at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
-...
-
- -
    -
  • Import the 47 CCAFS records to CGSpace, creating the SimpleArchiveFormat bundles and importing like:
  • -
- -
$ ./safbuilder.sh -c /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/3546.csv
-$ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/3546 -s /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/SimpleArchiveFormat -m 3546.map
-
- -
    -
  • Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs
  • -
- -

2016-08-26

- -
    -
  • CGSpace had issues tonight, not entirely crashing, but becoming unresponsive
  • -
  • The dspace log had this:
  • -
- -
2016-08-26 20:48:05,040 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -                                                               org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-
- -
    -
  • Related to /rest no doubt
  • -
- -

2016-08-27

- -
    -
  • Run corrections for Delia Grace and CONGO, DR, and deploy August changes to CGSpace
  • -
  • Run all system updates and reboot the server
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-09/index.html b/public/2016-09/index.html deleted file mode 100644 index 26d0d46bf..000000000 --- a/public/2016-09/index.html +++ /dev/null @@ -1,841 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - September, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

September, 2016

- -
-

2016-09-01

- -
    -
  • Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
  • -
  • Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
  • -
  • We had been using DC=ILRI to determine whether a user was ILRI or not
  • -
  • It looks like we might be able to use OUs now, instead of DCs:
  • -
- -
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-
- -

- -
    -
  • User who has been migrated to the root vs user still in the hierarchical structure:
  • -
- -
distinguishedName: CN=Last\, First (ILRI),OU=ILRI Kenya Employees,OU=ILRI Kenya,OU=ILRIHUB,DC=CGIARAD,DC=ORG
-distinguishedName: CN=Last\, First (ILRI),OU=ILRI Ethiopia Employees,OU=ILRI Ethiopia,DC=ILRI,DC=CGIARAD,DC=ORG
-
- -
    -
  • Changing the DSpace LDAP config to use OU=ILRIHUB seems to work:
  • -
- -

DSpace groups based on LDAP DN

- -
    -
  • Notes for local PostgreSQL database recreation from production snapshot:
  • -
- -
$ dropdb dspacetest
-$ createdb -O dspacetest --encoding=UNICODE dspacetest
-$ psql dspacetest -c 'alter user dspacetest createuser;'
-$ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-09-01.backup
-$ psql dspacetest -c 'alter user dspacetest nocreateuser;'
-$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
-$ vacuumdb dspacetest
-
- -
    -
  • Some names that I thought I fixed in July seem not to be:
  • -
- -
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
-      text_value       |              authority               | confidence
------------------------+--------------------------------------+------------
- Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb |        600
- Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 |        600
- Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 |        600
- Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 |        600
- Poole, E.J.           | c3a22456-8d6a-41f9-bba0-de51ef564d45 |        600
- Poole, E.J.           | 0fbd91b9-1b71-4504-8828-e26885bf8b84 |        600
-(6 rows)
-
- -
    -
  • At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45
  • -
- -
dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
-UPDATE 69
-
- -
    -
  • And for Peter Ballantyne:
  • -
- -
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
-    text_value     |              authority               | confidence
--------------------+--------------------------------------+------------
- Ballantyne, Peter | 2dcbcc7b-47b0-4fd7-bef9-39d554494081 |        600
- Ballantyne, Peter | 4f04ca06-9a76-4206-bd9c-917ca75d278e |        600
- Ballantyne, P.G.  | 4f04ca06-9a76-4206-bd9c-917ca75d278e |        600
- Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca |        600
- Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 |        600
-(5 rows)
-
- -
    -
  • Again, a few have the correct ORCID, but there should only be one authority…
  • -
- -
dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
-UPDATE 58
-
- -
    -
  • And for me:
  • -
- -
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';
- text_value |              authority               | confidence
-------------+--------------------------------------+------------
- Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 |        600
- Orth, A.   | 4884def0-4d7e-4256-9dd4-018cd60a5871 |        600
- Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
-(3 rows)
-dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';
-UPDATE 11
-
- -
    -
  • And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:
  • -
- -
dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
-UPDATE 166
-dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
-       text_value       |              authority               | confidence
-------------------------+--------------------------------------+------------
- Campbell, Bruce        | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
- Campbell, Bruce Morgan | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
- Campbell, B.           | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
- Campbell, B.M.         | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
-(4 rows)
-
- -
    -
  • After updating the Authority indexes (bin/dspace index-authority) everything looks good
  • -
  • Run authority updates on CGSpace
  • -
- -

2016-09-05

- -
    -
  • After one week of logging TLS connections on CGSpace:
  • -
- -
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
-217
-# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
-1164376
-# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
-TLSv1/DES-CBC3-SHA
-TLSv1/EDH-RSA-DES-CBC3-SHA
-
- -
    -
  • So this represents 0.02% of 1.16M connections over a one-week period
  • -
  • Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:
  • -
- -
value + "__description:" + cells["dc.type"].value
-
- -
    -
  • This gives you, for example: Mainstreaming gender in agricultural R&D.pdf__description:Brief
  • -
- -

2016-09-06

- -
    -
  • Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file
  • -
  • Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF: - -
      -
    • Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf
    • -
    • Imports fine on DSpace running on Mac OS X
    • -
    • Fails to import on DSpace running on Linux with error No such file or directory
    • -
  • -
  • Change diacritic in file name from á to a and re-create SAF bundle and zip - -
      -
    • Success on both Mac OS X and Linux…
    • -
  • -
  • Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)
  • -
  • See: http://www.fileformat.info/info/unicode/char/e1/index.htm
  • -
  • See: http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0
  • -
  • If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8
  • -
  • We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: ,, ', and "
  • -
- -
value.replace("'","").replace(",","").replace('"','')
-
- -
    -
  • I need to write a Python script to match that for renaming files in the file system
  • -
  • When importing SAF bundles it seems you can specify the target collection on the command line using -c 10568/4003 or in the collections file inside each item in the bundle
  • -
  • Seems that the latter method causes a null pointer exception, so I will just have to use the former method
  • -
  • In the end I was able to import the files after unzipping them ONLY on Linux - -
      -
    • The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above
    • -
  • -
  • Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the tomcat7 user, and deleting the bundle, for each collection’s items:
  • -
- -
$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv
-$ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map
-$ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/
-
- -

2016-09-07

- -
    -
  • Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff
  • -
  • Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads
  • -
  • I suggest we disable our nightly manual vacuum task, as we’re a mostly read workload, and I’d rather stick as close to the documentation as possible since we haven’t done any testing/observation of PostgreSQL
  • -
  • See: https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html
  • -
  • CGSpace went down and the error seems to be the same as always (lately):
  • -
- -
2016-09-07 11:39:23,162 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
-org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-...
-
- -
    -
  • Since CGSpace had crashed I quickly deployed the new LDAP settings before restarting Tomcat
  • -
- -

2016-09-13

- -
    -
  • CGSpace crashed twice today, errors from catalina.out:
  • -
- -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
-
- -
    -
  • I enabled logging of requests to /rest again
  • -
- -

2016-09-14

- -
    -
  • CGSpace crashed again, errors from catalina.out:
  • -
- -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
-
- -
    -
  • I restarted Tomcat and it was ok again
  • -
  • CGSpace crashed a few hours later, errors from catalina.out:
  • -
- -
Exception in thread "http-bio-127.0.0.1-8081-exec-25" java.lang.OutOfMemoryError: Java heap space
-        at java.lang.StringCoding.decode(StringCoding.java:215)
-
- -
    -
  • We haven’t seen that in quite a while…
  • -
  • Indeed, in a month of logs it only occurs 15 times:
  • -
- -
# grep -rsI "OutOfMemoryError" /var/log/tomcat7/catalina.* | wc -l
-15
-
- -
    -
  • I also see a bunch of errors from dspace.log:
  • -
- -
2016-09-14 12:23:07,981 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
-org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-
- -
    -
  • Looking at REST requests, it seems there is one IP hitting us nonstop:
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log  | sort -n | uniq -c | sort -h | tail -n 3
-    820 50.87.54.15
-  12872 70.32.99.142
-  25744 70.32.83.92
-# awk '{print $1}' /var/log/nginx/rest.log.1  | sort -n | uniq -c | sort -h | tail -n 3
-   7966 181.118.144.29
-  54706 70.32.99.142
- 109412 70.32.83.92
-
- -
    -
  • Those are the same IPs that were hitting us heavily in July, 2016 as well…
  • -
  • I think the stability issues are definitely from REST
  • -
  • Crashed AGAIN, errors from dspace.log:
  • -
- -
2016-09-14 14:31:43,069 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
-org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-
- -
    -
  • And more heap space errors:
  • -
- -
# grep -rsI "OutOfMemoryError" /var/log/tomcat7/catalina.* | wc -l
-19
-
- -
    -
  • There are no more rest requests since the last crash, so maybe there are other things causing this.
  • -
  • Hmm, I noticed a shitload of IPs from 180.76.0.0/16 are connecting to both CGSpace and DSpace Test (58 unique IPs concurrently!)
  • -
  • They seem to be coming from Baidu, and so far during today alone account for 16 of every connection:
  • -
- -
# grep -c ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-14
-29084
-# grep -c ip_addr=180.76.15 /home/cgspace.cgiar.org/log/dspace.log.2016-09-14
-5192
-
- -
    -
  • Other recent days are the same… hmmm.
  • -
  • From the activity control panel I can see 58 unique IPs hitting the site concurrently, which has GOT to hurt our stability
  • -
  • A list of all 2000 unique IPs from CGSpace logs today:
  • -
- -
# grep ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-11 | awk -F: '{print $5}' | sort -n | uniq -c | sort -h | tail -n 100
-
- -
    -
  • Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc… do we have any real users?
  • -
  • Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:
  • -
- -
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
-
- -
    -
  • Looking into the Catalina logs again around the time of the first crash, I see:
  • -
- -
Wed Sep 14 09:47:27 UTC 2016 | Query:id: 78581 AND type:2
-Wed Sep 14 09:47:28 UTC 2016 | Updating : 6/6 docs.
-Commit
-Commit done
-dn:CN=Haman\, Magdalena  (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
-Exception in thread "http-bio-127.0.0.1-8081-exec-193" java.lang.OutOfMemoryError: Java heap space
-
- -
    -
  • And after that I see a bunch of “pool error Timeout waiting for idle object”
  • -
  • Later, near the time of the next crash I see:
  • -
- -
dn:CN=Haman\, Magdalena  (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
-Wed Sep 14 11:29:55 UTC 2016 | Query:id: 79078 AND type:2
-Wed Sep 14 11:30:20 UTC 2016 | Updating : 6/6 docs.
-Commit
-Commit done
-Sep 14, 2016 11:32:22 AM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator buildModelAndSchemas
-SEVERE: Failed to generate the schema for the JAX-B elements
-com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions
-java.util.Map is an interface, and JAXB can't handle interfaces.
-        this problem is related to the following location:
-                at java.util.Map
-                at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
-                at com.atmire.dspace.rest.common.Statlet
-java.util.Map does not have a no-arg default constructor.
-        this problem is related to the following location:
-                at java.util.Map
-                at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
-                at com.atmire.dspace.rest.common.Statlet
-
- -
    -
  • Then 20 minutes later another outOfMemoryError:
  • -
- -
Exception in thread "http-bio-127.0.0.1-8081-exec-25" java.lang.OutOfMemoryError: Java heap space
-        at java.lang.StringCoding.decode(StringCoding.java:215)
-
- -
    -
  • Perhaps these particular issues are memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week
  • -
- -

Tomcat JVM usage day -Tomcat JVM usage week -Tomcat JVM usage month

- -
    -
  • And really, we did reduce the memory of CGSpace in late 2015, so maybe we should just increase it again, now that our usage is higher and we are having memory errors in the logs
  • -
  • Oh great, the configuration on the actual server is different than in configuration management!
  • -
  • Seems we added a bunch of settings to the /etc/default/tomcat7 in December, 2015 and never updated our ansible repository:
  • -
- -
JAVA_OPTS="-Djava.awt.headless=true -Xms3584m -Xmx3584m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -XX:-UseGCOverheadLimit -XX:MaxGCPauseMillis=250 -XX:GCTimeRatio=9 -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts"
-
- -
    -
  • So I’m going to bump the heap +512m and remove all the other experimental shit (and update ansible!)
  • -
  • Increased JVM heap to 4096m on CGSpace (linode01)
  • -
- -

2016-09-15

- -
    -
  • Looking at Google Webmaster Tools again, it seems the work I did on URL query parameters and blocking via the X-Robots-Tag HTTP header in March, 2016 seem to have had a positive effect on Google’s index for CGSpace
  • -
- -

Google Webmaster Tools for CGSpace

- -

2016-09-16

- -
    -
  • CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren’t on those lines so I’m not sure if they were yesterday:
  • -
- -
dn:CN=Orentlicher\, Natalie (CIAT),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
-Thu Sep 15 18:45:25 UTC 2016 | Query:id: 55785 AND type:2
-Thu Sep 15 18:45:26 UTC 2016 | Updating : 100/218 docs.
-Thu Sep 15 18:45:26 UTC 2016 | Updating : 200/218 docs.
-Thu Sep 15 18:45:27 UTC 2016 | Updating : 218/218 docs.
-Commit
-Commit done
-Exception in thread "http-bio-127.0.0.1-8081-exec-247" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "http-bio-127.0.0.1-8081-exec-241" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "http-bio-127.0.0.1-8081-exec-243" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "http-bio-127.0.0.1-8081-exec-258" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "http-bio-127.0.0.1-8081-exec-268" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "http-bio-127.0.0.1-8081-exec-263" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
-Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 7feaa95d-8e1f-4f45-80bb
--e14ef82ee224 to the index; possible analysis error.
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
-        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
-        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
-        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
-        at com.atmire.statistics.SolrLogThread.run(SourceFile:25)
-
- -
    -
  • I bumped the heap space from 4096m to 5120m to see if this is really about heap speace or not.
  • -
  • Looking into some of these errors that I’ve seen this week but haven’t noticed before:
  • -
- -
# zcat -f -- /var/log/tomcat7/catalina.* | grep -c 'Failed to generate the schema for the JAX-B elements'
-113
-
- -
    -
  • I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module
  • -
- -

2016-09-19

- -
    -
  • Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:
  • -
- -
$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
-$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace -d dspace -p fuuu
-
- -
    -
  • After that we need to take the top ~300 and make a controlled vocabulary for it
  • -
  • I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (#267)
  • -
- -

2016-09-20

- -
    -
  • Run all system updates on DSpace Test and reboot the server
  • -
  • Merge changes for sponsorship and affiliation controlled vocabularies (#267, #268)
  • -
  • Merge minor changes to messages.xml to reconcile it with the stock DSpace 5.1 one (#269)
  • -
  • Peter asked about adding title search to Discovery
  • -
  • The index was already defined, so I just added it to the search filters
  • -
  • It works but CGSpace apparently uses OR for search terms, which makes the search results basically useless
  • -
  • I need to read the docs and ask on the mailing list to see if we can tweak that
  • -
  • Generate a new list of sponsors from the database for Peter Ballantyne so we can clean them up and update the controlled vocabulary
  • -
- -

2016-09-21

- -
    -
  • Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: https://jira.duraspace.org/browse/DS-2809
  • -
  • We just need to set this in dspace/solr/search/conf/schema.xml:
  • -
- -
<solrQueryParser defaultOperator="AND"/>
-
- -
    -
  • It actually works really well, and search results return much less hits now (before, after):
  • -
- -

CGSpace search with "OR" boolean logic -DSpace Test search with "AND" boolean logic

- -
    -
  • Found a way to improve the configuration of Atmire’s Content and Usage Analysis (CUA) module for date fields
  • -
- -
-content.analysis.dataset.option.8=metadata:dateAccessioned:discovery
-+content.analysis.dataset.option.8=metadata:dc.date.accessioned:date(month)
-
- -
    -
  • This allows the module to treat the field as a date rather than a text string, so we can interrogate it more intelligently
  • -
  • Add dc.date.accessioned to XMLUI Discovery search filters
  • -
  • Major CGSpace crash because ILRI forgot to pay the Linode bill
  • -
  • 45 minutes of downtime!
  • -
  • Start processing the fixes to dc.description.sponsorship from Peter Ballantyne:
  • -
- -
$ ./fix-metadata-values.py -i sponsors-fix-23.csv -f dc.description.sponsorship -t correct -m 29 -d dspace -u dspace -p fuuu
-$ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
-
- -
    -
  • I need to run these and the others from a few days ago on CGSpace the next time we run updates
  • -
  • Also, I need to update the controlled vocab for sponsors based on these
  • -
- -

2016-09-22

- -
    -
  • Update controlled vocabulary for sponsorship based on the latest corrected values from the database
  • -
- -

2016-09-25

- -
    -
  • Merge accession date improvements for CUA module (#275)
  • -
  • Merge addition of accession date to Discovery search filters (#276)
  • -
  • Merge updates to sponsorship controlled vocabulary (#277)
  • -
  • I’ve been trying to add a search filter for dc.description so the IITA people can search for some tags they use there, but for some reason the filter never shows up in Atmire’s CUA
  • -
  • Not sure if it’s something like we already have too many filters there (30), or the filter name is reserved, etc…
  • -
  • Generate a list of ILRI subjects for Peter and Abenet to look through/fix:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=203 group by text_value order by count desc) to /tmp/ilrisubjects.csv with csv;
-
- -
    -
  • Regenerate Discovery indexes a few times after playing with discovery.xml index definitions (syntax, parameters, etc).
  • -
  • Merge changes to boolean logic in Solr search (#274)
  • -
  • Run all sponsorship and affiliation fixes on CGSpace, deploy latest 5_x-prod branch, and re-index Discovery on CGSpace
  • -
  • Tested OCSP stapling on DSpace Test’s nginx and it works:
  • -
- -
$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
-...
-OCSP response:
-======================================
-OCSP Response Data:
-...
-    Cert Status: good
-
- - - -

2016-09-27

- -
    -
  • Discuss fixing some ORCIDs for CCAFS author Sonja Vermeulen with Magdalena Haman
  • -
  • This author has a few variations:
  • -
- -
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeu
-len, S%';
-
- -
    -
  • And it looks like fe4b719f-6cc4-4d65-8504-7a83130b9f83 is the authority with the correct ORCID linked
  • -
- -
dspacetest=# update metadatavalue set authority='fe4b719f-6cc4-4d65-8504-7a83130b9f83w', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
-UPDATE 101
-
- -
    -
  • Hmm, now her name is missing from the authors facet and only shows the authority ID
  • -
  • On the production server there is an item with her ORCID but it is using a different authority: f01f7b7b-be3f-4df7-a61d-b73c067de88d
  • -
  • Maybe I used the wrong one… I need to look again at the production database
  • -
  • On a clean snapshot of the database I see the correct authority should be f01f7b7b-be3f-4df7-a61d-b73c067de88d, not fe4b719f-6cc4-4d65-8504-7a83130b9f83
  • -
  • Updating her authorities again and reindexing:
  • -
- -
dspacetest=# update metadatavalue set authority='f01f7b7b-be3f-4df7-a61d-b73c067de88d', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
-UPDATE 101
-
- -
    -
  • Use GitHub icon from Font Awesome instead of a PNG to save one extra network request
  • -
  • We can also replace the RSS and mail icons in community text!
  • -
  • Fix reference to dc.type.* in Atmire CUA module, as we now only index dc.type for “Output type”
  • -
- -

2016-09-28

- -
    -
  • Make a placeholder pull request for discovery.xml changes (#278), as I still need to test their effect on Atmire content analysis module
  • -
  • Make a placeholder pull request for Font Awesome changes (#279), which replaces the GitHub image in the footer with an icon, and add style for RSS and @ icons that I will start replacing in community/collection HTML intros
  • -
  • Had some issues with local test server after messing with Solr too much, had to blow everything away and re-install from CGSpace
  • -
  • Going to try to update Sonja Vermeulen’s authority to 2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0, as that seems to be one of her authorities that has an ORCID
  • -
  • Merge Font Awesome changes (#279)
  • -
  • Minor fix to a string in Atmire’s CUA module (#280)
  • -
  • This seems to be what I’ll need to do for Sonja Vermeulen (but with 2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0 instead on the live site):
  • -
- -
dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
-dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen SJ%';
-
- -
    -
  • And then update Discovery and Authority indexes
  • -
  • Minor fix for “Subject” string in Discovery search and Atmire modules (#281)
  • -
  • Start testing batch fixes for ILRI subject from Peter:
  • -
- -
$ ./fix-metadata-values.py -i ilrisubjects-fix-32.csv -f cg.subject.ilri -t correct -m 203 -d dspace -u dspace -p fuuuu
-$ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -m 203 -d dspace -u dspace -p fuuu
-
- -

2016-09-29

- -
    -
  • Add cg.identifier.ciatproject to metadata registry in preparation for CIAT project tag
  • -
  • Merge changes for CIAT project tag (#282)
  • -
  • DSpace Test (linode02) became unresponsive for some reason, I had to hard reboot it from the Linode console
  • -
  • People on DSpace mailing list gave me a query to get authors from certain collections:
  • -
- -
dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));
-
- -

2016-09-30

- -
    -
  • Deny access to REST API’s find-by-metadata-field endpoint to protect against an upstream security issue (DS-3250)
  • -
  • There is a patch but it is only for 5.5 and doesn’t apply cleanly to 5.1
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-10/index.html b/public/2016-10/index.html deleted file mode 100644 index a45928621..000000000 --- a/public/2016-10/index.html +++ /dev/null @@ -1,527 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - October, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

October, 2016

- -
-

2016-10-03

- -
    -
  • Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
  • -
  • Need to test the following scenarios to see how author order is affected: - -
      -
    • ORCIDs only
    • -
    • ORCIDs plus normal authors
    • -
  • -
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • -
- -
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-
- -

- -
    -
  • Hmm, with the dc.contributor.author column removed, DSpace doesn’t detect any changes
  • -
  • With a blank dc.contributor.author column, DSpace wants to remove all non-ORCID authors and add the new ORCID authors
  • -
  • I added the disclaimer text to the About page, then added a footer link to the disclaimer’s ID, but there is a Bootstrap issue that causes the page content to disappear when using in-page anchors: https://github.com/twbs/bootstrap/issues/1768
  • -
- -

Bootstrap issue with in-page anchors

- -
    -
  • Looks like we’ll just have to add the text to the About page (without a link) or add a separate page
  • -
- -

2016-10-04

- -
    -
  • Start testing cleanups of authors that Peter sent last week
  • -
  • Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking: - -
      -
    • Trim leading/trailing whitespace
    • -
    • Find invalid characters
    • -
    • Cluster values to merge obvious authors
    • -
  • -
  • That left us with 3,180 valid corrections and 3 deletions:
  • -
- -
$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
-$ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu
-
- -
    -
  • Remove old about page (#284)
  • -
  • CGSpace crashed a few times today
  • -
  • Generate list of unique authors in CCAFS collections:
  • -
- -
dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
-
- -

2016-10-05

- -
    -
  • Work on more infrastructure cleanups for Ansible DSpace role
  • -
  • Clean up Let’s Encrypt plumbing and submit pull request for rmg-ansible-public (#60)
  • -
- -

2016-10-06

- -
    -
  • Nice! DSpace Test (linode02) is now having java.lang.OutOfMemoryError: Java heap space errors…
  • -
  • Heap space is 2048m, and we have 5GB of RAM being used for OS cache (Solr!) so let’s just bump the memory to 3072m
  • -
  • Magdalena from CCAFS asked why the colors in the thumbnails for these two items look different, even though they are the same in the PDF itself
  • -
- -

CMYK vs sRGB colors

- -
    -
  • Turns out the first PDF was exported from InDesign using CMYK and the second one was using sRGB
  • -
  • Run all system updates on DSpace Test and reboot it
  • -
- -

2016-10-08

- -
    -
  • Re-deploy CGSpace with latest changes from late September and early October
  • -
  • Run fixes for ILRI subjects and delete blank metadata values:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
-DELETE 11
-
- -
    -
  • Run all system updates and reboot CGSpace
  • -
  • Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):
  • -
- -
root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l
-47
-
- -
    -
  • Delete 2GB cron-filter-media.log file, as it is just a log from a cron job and it doesn’t get rotated like normal log files (almost a year now maybe)
  • -
- -

2016-10-14

- -
    -
  • Run all system updates on DSpace Test and reboot server
  • -
  • Looking into some issues with Discovery filters in Atmire’s content and usage analysis module after adjusting the filter class
  • -
  • Looks like changing the filters from configuration.DiscoverySearchFilterFacet to configuration.DiscoverySearchFilter breaks them in Atmire CUA module
  • -
- -

2016-10-17

- -
    -
  • A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:
  • -
- -
$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
-
- -
    -
  • One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)
  • -
- -

2016-10-18

- -
    -
  • Start working on DSpace 5.5 porting work again:

    - -

    $ git checkout -b 5_x-55 5_x-prod -$ git rebase -i dspace-5.5

  • - -
  • Have to fix about ten merge conflicts, mostly in the SCSS for the CGIAR theme

  • - -
  • Skip 1e34751b8cf17021f45d4cf2b9a5800c93fb4cb2 in lieu of upstream’s 55e623d1c2b8b7b1fa45db6728e172e06bfa8598 (fixes X-Forwarded-For header) because I had made the same fix myself and it’s better to use the upstream one

  • - -
  • I notice this rebase gets rid of GitHub merge commits… which actually might be fine because merges are fucking annoying to deal with when remote people merge without pulling and rebasing their branch first

  • - -
  • Finished up applying the 5.5 sitemap changes to all themes

  • - -
  • Merge the discovery.xml cleanups (#278)

  • - -
  • Merge some minor edits to the distribution license (#285)

  • -
- -

2016-10-19

- -
    -
  • When we move to DSpace 5.5 we should also cherry pick some patches from 5.6 branch: - -
      -
    • memory cleanup: 9f0f5940e7921765c6a22e85337331656b18a403
    • -
    • sql injection: c6fda557f731dbc200d7d58b8b61563f86fe6d06
    • -
    • pdfbox security issue: b5330b78153b2052ed3dc2fd65917ccdbfcc0439
    • -
  • -
- -

2016-10-20

- -
    -
  • Run CCAFS author corrections on CGSpace
  • -
  • Discovery reindexing took forever and kinda caused CGSpace to crash, so I ran all system updates and rebooted the server
  • -
- -

2016-10-25

- -
    -
  • Move the LIVES community from the top level to the ILRI projects community
  • -
- -
$ /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=10568/25101
-
- -
    -
  • Start testing some things for DSpace 5.5, like command line metadata import, PDF media filter, and Atmire CUA
  • -
  • Start looking at batch fixing of “old” ILRI website links without www or https, for example:
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ilri.org%';
-
- -
    -
  • Also CCAFS has HTTPS and their links should use it where possible:
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ccafs.cgiar.org%';
-
- -
    -
  • And this will find community and collection HTML text that is using the old style PNG/JPG icons for RSS and email (we should be using Font Awesome icons instead):
  • -
- -
dspace=# select text_value from metadatavalue where resource_type_id in (3,4) and text_value like '%Iconrss2.png%';
-
- -
    -
  • Turns out there are shit tons of varieties of this, like with http, https, www, separate </img> tags, alignments, etc
  • -
  • Had to find all variations and replace them individually:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>','<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/email.jpg"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="http://www.ilri.org/images/Iconrss2.png"/>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="http://www.ilri.org/images/Iconrss2.png"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="http://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="http://www.ilri.org/images/email.jpg"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="http://www.ilri.org/images/Iconrss2.png"></img>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="http://www.ilri.org/images/Iconrss2.png"></img>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="http://www.ilri.org/images/email.jpg"></img>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="http://www.ilri.org/images/email.jpg"></img>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://ilri.org/images/Iconrss2.png"></img>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://ilri.org/images/Iconrss2.png"></img>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://ilri.org/images/email.jpg"></img>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://ilri.org/images/email.jpg"></img>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/Iconrss2.png"></img>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/Iconrss2.png"></img>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/email.jpg"></img>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/email.jpg"></img>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://ilri.org/images/Iconrss2.png"/>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://ilri.org/images/Iconrss2.png"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://ilri.org/images/email.jpg"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="https://www.ilri.org/images/Iconrss2.png"/>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="https://www.ilri.org/images/Iconrss2.png"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="https://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="https://www.ilri.org/images/email.jpg"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="http://www.ilri.org/images/Iconrss2.png"/>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="http://www.ilri.org/images/Iconrss2.png"/>%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="http://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="http://www.ilri.org/images/email.jpg"/>%';
-
- -
    -
  • Getting rid of these reduces the number of network requests each client makes on community/collection pages, and makes use of Font Awesome icons (which they are already loading anyways!)
  • -
  • And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc
  • -
  • I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them
  • -
- -

2016-10-27

- -
    -
  • Run Font Awesome fixes on DSpace Test:
  • -
- -
dspace=# \i /tmp/font-awesome-text-replace.sql
-UPDATE 17
-UPDATE 17
-UPDATE 3
-UPDATE 3
-UPDATE 30
-UPDATE 30
-UPDATE 1
-UPDATE 1
-UPDATE 7
-UPDATE 7
-UPDATE 1
-UPDATE 1
-UPDATE 1
-UPDATE 1
-UPDATE 1
-UPDATE 1
-UPDATE 0
-
- -
    -
  • Looks much better now:
  • -
- -

CGSpace with old icons -DSpace Test with Font Awesome icons

- -
    -
  • Run the same replacements on CGSpace
  • -
- -

2016-10-30

- -
    -
  • Fix some messed up authors on CGSpace:
  • -
- -
dspace=# update metadatavalue set authority='799da1d8-22f3-43f5-8233-3d2ef5ebf8a8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Charleston, B.%';
-UPDATE 10
-dspace=# update metadatavalue set authority='e936f5c5-343d-4c46-aa91-7a1fff6277ed', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Knight-Jones%';
-UPDATE 36
-
- -
    -
  • I updated the authority index but nothing seemed to change, so I’ll wait and do it again after I update Discovery below
  • -
  • Skype chat with Tsega about the IFPRI contentdm bridge
  • -
  • We tested harvesting OAI in an example collection to see how it works
  • -
  • Talk to Carlos Quiros about CG Core metadata in CGSpace
  • -
  • Get a list of countries from CGSpace so I can do some batch corrections:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=228 group by text_value order by count desc) to /tmp/countries.csv with csv;
-
- -
    -
  • Fix a bunch of countries in Open Refine and run the corrections on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i countries-fix-18.csv -f dc.coverage.country -t 'correct' -m 228 -d dspace -u dspace -p fuuu
-$ ./delete-metadata-values.py -i countries-delete-2.csv -f dc.coverage.country -m 228 -d dspace -u dspace -p fuuu
-
- -
    -
  • Run a shit ton of author fixes from Peter Ballantyne that we’ve been cleaning up for two months:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/authors-fix-pb2.csv -f dc.contributor.author -t correct -m 3 -u dspace -d dspace -p fuuu
-
- -
    -
  • Run a few URL corrections for ilri.org and doi.org, etc:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://www.ilri.org','https://www.ilri.org') where resource_type_id=2 and text_value like '%http://www.ilri.org%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://mahider.ilri.org', 'https://cgspace.cgiar.org') where resource_type_id=2 and text_value like '%http://mahider.%.org%' and metadata_field_id not in (28);
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://dx.doi.org%' and metadata_field_id not in (18,26,28,111);
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://doi.org%' and metadata_field_id not in (18,26,28,111);
-
- -
    -
  • I skipped metadata fields like citation and description
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-11/index.html b/public/2016-11/index.html deleted file mode 100644 index eb6609996..000000000 --- a/public/2016-11/index.html +++ /dev/null @@ -1,743 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - November, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

November, 2016

- -
-

2016-11-01

- -
    -
  • Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
  • -
- -

Listings and Reports with output type

- -

- -

2016-11-02

- -
    -
  • Migrate DSpace Test to DSpace 5.5 (notes)
  • -
  • Run all updates on DSpace Test and reboot the server
  • -
  • Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! (#63)
  • -
  • Indexing Discovery on DSpace Test took 332 minutes, which is like five times as long as it usually takes
  • -
  • At the end it appeared to finish correctly but there were lots of errors right after it finished:
  • -
- -
2016-11-02 15:09:48,578 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76454 to Index
-2016-11-02 15:09:48,584 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/3202 to Index
-2016-11-02 15:09:48,589 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76455 to Index
-2016-11-02 15:09:48,590 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/51693 to Index
-2016-11-02 15:09:48,590 INFO  org.dspace.discovery.IndexClient @ Done with indexing
-2016-11-02 15:09:48,600 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76456 to Index
-2016-11-02 15:09:48,613 INFO  org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/55536 to Index
-2016-11-02 15:09:48,616 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76457 to Index
-2016-11-02 15:09:48,634 ERROR com.atmire.dspace.discovery.AtmireSolrService @
-java.lang.NullPointerException
-        at org.dspace.discovery.SearchUtils.getDiscoveryConfiguration(SourceFile:57)
-        at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:824)
-        at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:821)
-        at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:898)
-        at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
-        at org.dspace.storage.rdbms.DatabaseUtils$ReindexerThread.run(DatabaseUtils.java:945)
-
- -
    -
  • DSpace is still up, and a few minutes later I see the default DSpace indexer is still running
  • -
  • Sure enough, looking back before the first one finished, I see output from both indexers interleaved in the log:
  • -
- -
2016-11-02 15:09:28,545 INFO  org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/47242 to Index
-2016-11-02 15:09:28,633 INFO  org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/60785 to Index
-2016-11-02 15:09:28,678 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (55695 of 55722): 43557
-2016-11-02 15:09:28,688 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (55703 of 55722): 34476
-
- -
    -
  • I will raise a ticket with Atmire to ask them
  • -
- -

2016-11-06

- -
    -
  • After re-deploying and re-indexing I didn’t see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take
  • -
- -

2016-11-07

- -
    -
  • Horrible one liner to get Linode ID from certain Ansible host vars:
  • -
- -
$ grep -A 3 contact_info * | grep -E "(Orth|Sisay|Peter|Daniel|Tsega)" | awk -F'-' '{print $1}' | grep linode | uniq | xargs grep linode_id
-
- -
    -
  • I noticed some weird CRPs in the database, and they don’t show up in Discovery for some reason, perhaps the :
  • -
  • I’ll export these and fix them in batch:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;
-COPY 22
-
- -
    -
  • Test running the replacements:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/CRPs.csv -f cg.contributor.crp -t correct -m 230 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • Add AMR to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary (#288)
  • -
- -

2016-11-08

- -
    -
  • Atmire’s Listings and Reports module seems to be broken on DSpace 5.5
  • -
- -

Listings and Reports broken in DSpace 5.5

- -
    -
  • I’ve filed a ticket with Atmire
  • -
  • Thinking about batch updates for ORCIDs and authors
  • -
  • Playing with SolrClient in Python to query Solr
  • -
  • All records in the authority core are either authority_type:orcid or authority_type:person
  • -
  • There is a deleted field and all items seem to be false, but might be important sanity check to remember
  • -
  • The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL
  • -
  • Dump of the top ~200 authors in CGSpace:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
-
- -

2016-11-09

- -
    -
  • CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the 5_x-prod branch, and rebooted the server
  • -
  • The error was Timeout waiting for idle object but I haven’t looked into the Tomcat logs to see what happened
  • -
  • Also, I ran the corrections for CRPs from earlier this week
  • -
- -

2016-11-10

- -
    -
  • Helping Megan Zandstra and CIAT with some questions about the REST API
  • -
  • Playing with find-by-metadata-field, this works:
  • -
- -
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}'
-
- -
    -
  • But the results are deceiving because metadata fields can have text languages and your query must match exactly!
  • -
- -
dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
- text_value | text_lang
-------------+-----------
- SEEDS      |
- SEEDS      |
- SEEDS      | en_US
-(3 rows)
-
- -
    -
  • So basically, the text language here could be null, blank, or en_US
  • -
  • To query metadata with these properties, you can do:
  • -
- -
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' | jq length
-55
-$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":""}' | jq length
-34
-$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":"en_US"}' | jq length
-
- -
    -
  • The results (55+34=89) don’t seem to match those from the database:
  • -
- -
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang is null;
- count
--------
-    15
-dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='';
- count
--------
-     4
-dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='en_US';
- count
--------
-    66
-
- -
    -
  • So, querying from the API I get 55 + 34 = 89 results, but the database actually only has 85…
  • -
  • And the find-by-metadata-field endpoint doesn’t seem to have a way to get all items with the field, or a wildcard value
  • -
  • I’ll ask a question on the dspace-tech mailing list
  • -
  • And speaking of text_lang, this is interesting:
  • -
- -
dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
- text_lang
------------
-
- ethnob
- en
- spa
- EN
- es
- frn
- en_
- en_US
-
- EN_US
- eng
- en_U
- fr
-(14 rows)
-
- -
    -
  • Generate a list of all these so I can maybe fix them in batch:
  • -
- -
dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv;
-COPY 14
-
- -
    -
  • Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues:
  • -
- -
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
-UPDATE 85
-
- -
    -
  • The fix-metadata.py script I have is meant for specific metadata values, so if I want to update some text_lang values I should just do it directly in the database
  • -
  • For example, on a limited set:
  • -
- -
dspace=# update metadatavalue set text_lang=NULL where resource_type_id=2 and metadata_field_id=203 and text_value='LIVESTOCK' and text_lang='';
-UPDATE 420
-
- -
    -
  • And assuming I want to do it for all fields:
  • -
- -
dspacetest=# update metadatavalue set text_lang=NULL where resource_type_id=2 and text_lang='';
-UPDATE 183726
-
- -
    -
  • After that restarted Tomcat and PostgreSQL (because I’m superstitious about caches) and now I see the following in REST API query:
  • -
- -
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' | jq length
-71
-$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":""}' | jq length
-0
-$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":"en_US"}' | jq length
-
- -
    -
  • Not sure what’s going on, but Discovery shows 83 values, and database shows 85, so I’m going to reindex Discovery just in case
  • -
- -

2016-11-14

- -
    -
  • I applied Atmire’s suggestions to fix Listings and Reports for DSpace 5.5 and now it works
  • -
  • There were some issues with the dspace/modules/jspui/pom.xml, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire’s installation procedure must have changed
  • -
  • So there is apparently this Tomcat native way to limit web crawlers to one session: Crawler Session Manager
  • -
  • After adding that to server.xml bots matching the pattern in the configuration will all use ONE session, just like normal users:
  • -
- -
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
-HTTP/1.1 200 OK
-Connection: keep-alive
-Content-Encoding: gzip
-Content-Language: en-US
-Content-Type: text/html;charset=utf-8
-Date: Mon, 14 Nov 2016 19:47:29 GMT
-Server: nginx
-Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
-Transfer-Encoding: chunked
-Vary: Accept-Encoding
-X-Cocoon-Version: 2.2.0
-X-Robots-Tag: none
-
-$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
-HTTP/1.1 200 OK
-Connection: keep-alive
-Content-Encoding: gzip
-Content-Language: en-US
-Content-Type: text/html;charset=utf-8
-Date: Mon, 14 Nov 2016 19:47:35 GMT
-Server: nginx
-Transfer-Encoding: chunked
-Vary: Accept-Encoding
-X-Cocoon-Version: 2.2.0
-
- -
    -
  • The first one gets a session, and any after that — within 60 seconds — will be internally mapped to the same session by Tomcat
  • -
  • This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!
  • -
- -

2016-11-15

- -
    -
  • The Tomcat JVM heap looks really good after applying the Crawler Session Manager fix on DSpace Test last night:
  • -
- -

Tomcat JVM heap (day) after setting up the Crawler Session Manager -Tomcat JVM heap (week) after setting up the Crawler Session Manager

- -
    -
  • Seems the default regex doesn’t catch Baidu, though:
  • -
- -
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
-HTTP/1.1 200 OK
-Connection: keep-alive
-Content-Encoding: gzip
-Content-Language: en-US
-Content-Type: text/html;charset=utf-8
-Date: Tue, 15 Nov 2016 08:49:54 GMT
-Server: nginx
-Set-Cookie: JSESSIONID=131409D143E8C01DE145C50FC748256E; Path=/; Secure; HttpOnly
-Transfer-Encoding: chunked
-Vary: Accept-Encoding
-X-Cocoon-Version: 2.2.0
-
-$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
-HTTP/1.1 200 OK
-Connection: keep-alive
-Content-Encoding: gzip
-Content-Language: en-US
-Content-Type: text/html;charset=utf-8
-Date: Tue, 15 Nov 2016 08:49:59 GMT
-Server: nginx
-Set-Cookie: JSESSIONID=F6403C084480F765ED787E41D2521903; Path=/; Secure; HttpOnly
-Transfer-Encoding: chunked
-Vary: Accept-Encoding
-X-Cocoon-Version: 2.2.0
-
- -
    -
  • Adding Baiduspider to the list of user agents seems to work, and the final configuration should be:
  • -
- -
<!-- Crawler Session Manager Valve helps mitigate damage done by web crawlers -->
-<Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve"
-       crawlerUserAgents=".*[bB]ot.*|.*Yahoo! Slurp.*|.*Feedfetcher-Google.*|.*Baiduspider.*" />
-
- -
    -
  • Looking at the bots that were active yesterday it seems the above regex should be sufficient:
  • -
- -
$ grep -o -E 'Mozilla/5\.0 \(compatible;.*\"' /var/log/nginx/access.log | sort | uniq
-Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
-Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-"
-Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
-Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" "-"
-Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)" "-"
-
- -
    -
  • Neat maven trick to exclude some modules from being built:
  • -
- -
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
-
- -
    -
  • We absolutely don’t use those modules, so we shouldn’t build them in the first place
  • -
- -

2016-11-17

- -
    -
  • Generate a list of journal titles for Peter and Abenet to look through so we can make a controlled vocabulary out of them:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc) to /tmp/journal-titles.csv with csv;
-COPY 2515
-
- -
    -
  • Send a message to users of the CGSpace REST API to notify them of upcoming upgrade so they can test their apps against DSpace Test
  • -
  • Test an update old, non-HTTPS links to the CCAFS website in CGSpace metadata:
  • -
- -
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, 'http://ccafs.cgiar.org','https://ccafs.cgiar.org') where resource_type_id=2 and text_value like '%http://ccafs.cgiar.org%';
-UPDATE 164
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://ccafs.cgiar.org','https://ccafs.cgiar.org') where resource_type_id=2 and text_value like '%http://ccafs.cgiar.org%';
-UPDATE 7
-
- -
    -
  • Had to run it twice to get all (not sure about “global” regex in PostgreSQL)
  • -
  • Run the updates on CGSpace as well
  • -
  • Run through some collections and manually regenerate some PDF thumbnails for items from before 2016 on DSpace Test to compare with CGSpace
  • -
  • I’m debating forcing the re-generation of ALL thumbnails, since some come from DSpace 3 and 4 when the thumbnailing wasn’t as good
  • -
  • The results were very good, I think that after we upgrade to 5.5 I will do it, perhaps one community / collection at a time:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/67156 -p "ImageMagick PDF Thumbnail"
-
- -
    -
  • In related news, I’m looking at thumbnails of thumbnails (the ones we uploaded manually before, and now DSpace’s media filter has made thumbnails of THEM):
  • -
- -
dspace=# select text_value from metadatavalue where text_value like '%.jpg.jpg';
-
- -
    -
  • I’m not sure if there’s anything we can do, actually, because we would have to remove those from the thumbnail bundles, and replace them with the regular JPGs from the content bundle, and then remove them from the assetstore…
  • -
- -

2016-11-18

- -
    -
  • Enable Tomcat Crawler Session Manager on CGSpace
  • -
- -

2016-11-21

- -
    -
  • More work on Ansible playbooks for PostgreSQL 9.3→9.5 and Java 7→8 work
  • -
  • CGSpace virtual managers meeting
  • -
  • I need to look into making the item thumbnail clickable
  • -
  • Macaroni Bros said they tested the DSpace Test (DSpace 5.5) REST API for CCAFS and WLE sites and it works as expected
  • -
- -

2016-11-23

- -
    -
  • Upgrade Java from 7 to 8 on CGSpace
  • -
  • I had started planning the inplace PostgreSQL 9.3→9.5 upgrade but decided that I will have to pg_dump and pg_restore when I move to the new server soon anyways, so there’s no need to upgrade the database right now
  • -
  • Chat with Carlos about CGCore and the CGSpace metadata registry
  • -
  • Dump CGSpace metadata field registry for Carlos: https://gist.github.com/alanorth/8cbd0bb2704d4bbec78025b4742f8e70
  • -
  • Send some feedback to Carlos on CG Core so they can better understand how DSpace/CGSpace uses metadata
  • -
  • Notes about PostgreSQL tuning from James: https://paste.fedoraproject.org/488776/14798952/
  • -
  • Play with Creative Commons stuff in DSpace submission step
  • -
  • It seems to work but it doesn’t let you choose a version of CC (like 4.0), and we would need to customize the XMLUI item display so it doesn’t display the gross CC badges
  • -
- -

2016-11-24

- -
    -
  • Bizuwork was testing DSpace Test on DSPace 5.5 and noticed that the Listings and Reports module seems to be case sensitive, whereas CGSpace’s Listings and Reports isn’t (ie, a search for “orth, alan” vs “Orth, Alan” returns the same results on CGSpace, but different on DSpace Test)
  • -
  • I have raised a ticket with Atmire
  • -
  • Looks like this issue is actually the new Listings and Reports module honoring the Solr search queries more correctly
  • -
- -

2016-11-27

- -
    -
  • Run system updates on DSpace Test and reboot the server
  • -
  • Deploy DSpace 5.5 on CGSpace: - -
      -
    • maven package
    • -
    • stop tomcat
    • -
    • backup postgresql
    • -
    • run Atmire 5.5 schema deletions
    • -
    • delete the deployed spring folder
    • -
    • ant update
    • -
    • run system updates
    • -
    • reboot server
    • -
  • -
  • Need to do updates for ansible infrastructure role defaults, and switch the GitHub branch to the new 5.5 one
  • -
  • Testing DSpace 5.5 on CGSpace, it seems CUA’s export as XLS works for Usage statistics, but not Content statistics
  • -
  • I will raise a bug with Atmire
  • -
- -

2016-11-28

- -
    -
  • One user says they are still getting a blank page when he logs in (just CGSpace header, but no community list)
  • -
  • Looking at the Catlina logs I see there is some super long-running indexing process going on:
  • -
- -
INFO: FrameworkServlet 'oai': initialization completed in 2600 ms
-[>                                                  ] 0% time remaining: Calculating... timestamp: 2016-11-28 03:00:18
-[>                                                  ] 0% time remaining: 11 hour(s) 57 minute(s) 46 seconds. timestamp: 2016-11-28 03:00:19
-[>                                                  ] 0% time remaining: 23 hour(s) 4 minute(s) 28 seconds. timestamp: 2016-11-28 03:00:19
-[>                                                  ] 0% time remaining: 15 hour(s) 35 minute(s) 23 seconds. timestamp: 2016-11-28 03:00:19
-[>                                                  ] 0% time remaining: 14 hour(s) 5 minute(s) 56 seconds. timestamp: 2016-11-28 03:00:19
-[>                                                  ] 0% time remaining: 11 hour(s) 23 minute(s) 49 seconds. timestamp: 2016-11-28 03:00:19
-[>                                                  ] 0% time remaining: 11 hour(s) 21 minute(s) 57 seconds. timestamp: 2016-11-28 03:00:20
-
- -
    -
  • It says OAI, and seems to start at 3:00 AM, but I only see the filter-media cron job set to start then
  • -
  • Double checking the DSpace 5.x upgrade notes for anything I missed, or troubleshooting tips
  • -
  • Running some manual processes just in case:
  • -
- -
$ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacetest.cgiar.org/config/registries/dcterms-types.xml
-$ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacetest.cgiar.org/config/registries/dublin-core-types.xml
-$ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacetest.cgiar.org/config/registries/eperson-types.xml
-$ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacetest.cgiar.org/config/registries/workflow-types.xml
-
- - - -

2016-11-29

- -
    -
  • Sisay tried deleting and re-creating Goshu’s account but he still can’t see any communities on the homepage after he logs in
  • -
  • Around the time of his login I see this in the DSpace logs:
  • -
- -
2016-11-29 07:56:36,350 INFO  org.dspace.authenticate.LDAPAuthentication @ g.cherinet@cgiar.org:session_id=F628E13AB4EF2BA949198A99EFD8EBE4:ip_addr=213.55.99.121:failed_login:no DN found for user g.cherinet@cgiar.org
-2016-11-29 07:56:36,350 INFO  org.dspace.authenticate.PasswordAuthentication @ g.cherinet@cgiar.org:session_id=F628E13AB4EF2BA949198A99EFD8EBE4:ip_addr=213.55.99.121:authenticate:attempting password auth of user=g.cherinet@cgiar.org
-2016-11-29 07:56:36,352 INFO  org.dspace.app.xmlui.utils.AuthenticationUtil @ g.cherinet@cgiar.org:session_id=F628E13AB4EF2BA949198A99EFD8EBE4:ip_addr=213.55.99.121:failed_login:email=g.cherinet@cgiar.org, realm=null, result=2
-2016-11-29 07:56:36,545 INFO  com.atmire.utils.UpdateSolrStatsMetadata @ Start processing item 10568/50391 id:51744
-2016-11-29 07:56:36,545 INFO  com.atmire.utils.UpdateSolrStatsMetadata @ Processing item stats
-2016-11-29 07:56:36,583 INFO  com.atmire.utils.UpdateSolrStatsMetadata @ Solr metadata up-to-date
-2016-11-29 07:56:36,583 INFO  com.atmire.utils.UpdateSolrStatsMetadata @ Processing item's bitstream stats
-2016-11-29 07:56:36,608 INFO  com.atmire.utils.UpdateSolrStatsMetadata @ Solr metadata up-to-date
-2016-11-29 07:56:36,701 INFO  org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ facets for scope, null: 23
-2016-11-29 07:56:36,747 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: Error executing query
-        at org.dspace.discovery.SolrServiceImpl.search(SolrServiceImpl.java:1618)
-        at org.dspace.discovery.SolrServiceImpl.search(SolrServiceImpl.java:1600)
-        at org.dspace.discovery.SolrServiceImpl.search(SolrServiceImpl.java:1583)
-        at org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer.performSearch(SidebarFacetsTransformer.java:165)
-        at org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer.addOptions(SidebarFacetsTransformer.java:174)
-        at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:228)
-        at sun.reflect.GeneratedMethodAccessor277.invoke(Unknown Source)
-...
-
- -
    -
  • At about the same time in the solr log I see a super long query:
  • -
- -
2016-11-29 07:56:36,734 INFO  org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={q=*:*&fl=dateIssued.year,handle,search.resourcetype,search.resourceid,search.uniqueid&start=0&fq=NOT(withdrawn:true)&fq=NOT(discoverable:false)&fq=dateIssued.year:[*+TO+*]&fq=read:(g0+OR+e574+OR+g0+OR+g3+OR+g9+OR+g10+OR+g14+OR+g16+OR+g18+OR+g20+OR+g23+OR+g24+OR+g2072+OR+g2074+OR+g28+OR+g2076+OR+g29+OR+g2078+OR+g2080+OR+g34+OR+g2082+OR+g2084+OR+g38+OR+g2086+OR+g2088+OR+g2091+OR+g43+OR+g2092+OR+g2093+OR+g2095+OR+g2097+OR+g50+OR+g2099+OR+g51+OR+g2103+OR+g62+OR+g65+OR+g2115+OR+g2117+OR+g2119+OR+g2121+OR+g2123+OR+g2125+OR+g77+OR+g78+OR+g79+OR+g2127+OR+g80+OR+g2129+OR+g2131+OR+g2133+OR+g2134+OR+g2135+OR+g2136+OR+g2137+OR+g2138+OR+g2139+OR+g2140+OR+g2141+OR+g2142+OR+g2148+OR+g2149+OR+g2150+OR+g2151+OR+g2152+OR+g2153+OR+g2154+OR+g2156+OR+g2165+OR+g2167+OR+g2171+OR+g2174+OR+g2175+OR+g129+OR+g2182+OR+g2186+OR+g2189+OR+g153+OR+g158+OR+g166+OR+g167+OR+g168+OR+g169+OR+g2225+OR+g179+OR+g2227+OR+g2229+OR+g183+OR+g2231+OR+g184+OR+g2233+OR+g186+OR+g2235+OR+g2237+OR+g191+OR+g192+OR+g193+OR+g202+OR+g203+OR+g204+OR+g205+OR+g207+OR+g208+OR+g218+OR+g219+OR+g222+OR+g223+OR+g230+OR+g231+OR+g238+OR+g241+OR+g244+OR+g254+OR+g255+OR+g262+OR+g265+OR+g268+OR+g269+OR+g273+OR+g276+OR+g277+OR+g279+OR+g282+OR+g2332+OR+g2335+OR+g2338+OR+g292+OR+g293+OR+g2341+OR+g296+OR+g2344+OR+g297+OR+g2347+OR+g301+OR+g2350+OR+g303+OR+g305+OR+g2356+OR+g310+OR+g311+OR+g2359+OR+g313+OR+g2362+OR+g2365+OR+g2368+OR+g321+OR+g2371+OR+g325+OR+g2374+OR+g328+OR+g2377+OR+g2380+OR+g333+OR+g2383+OR+g2386+OR+g2389+OR+g342+OR+g343+OR+g2392+OR+g345+OR+g2395+OR+g348+OR+g2398+OR+g2401+OR+g2404+OR+g2407+OR+g364+OR+g366+OR+g2425+OR+g2427+OR+g385+OR+g387+OR+g388+OR+g389+OR+g2442+OR+g395+OR+g2443+OR+g2444+OR+g401+OR+g403+OR+g405+OR+g408+OR+g2457+OR+g2458+OR+g411+OR+g2459+OR+g414+OR+g2463+OR+g417+OR+g2465+OR+g2467+OR+g421+OR+g2469+OR+g2471+OR+g424+OR+g2473+OR+g2475+OR+g2476+OR+g429+OR+g433+OR+g2481+OR+g2482+OR+g2483+OR+g443+OR+g444+OR+g445+OR+g446+OR+g448+OR+g453+OR+g455+OR+g456+OR+g457+OR+g458+OR+g459+OR+g461+OR+g462+OR+g463+OR+g464+OR+g465+OR+g467+OR+g468+OR+g469+OR+g474+OR+g476+OR+g477+OR+g480+OR+g483+OR+g484+OR+g493+OR+g496+OR+g497+OR+g498+OR+g500+OR+g502+OR+g504+OR+g505+OR+g2559+OR+g2560+OR+g513+OR+g2561+OR+g515+OR+g516+OR+g518+OR+g519+OR+g2567+OR+g520+OR+g521+OR+g522+OR+g2570+OR+g523+OR+g2571+OR+g524+OR+g525+OR+g2573+OR+g526+OR+g2574+OR+g527+OR+g528+OR+g2576+OR+g529+OR+g531+OR+g2579+OR+g533+OR+g534+OR+g2582+OR+g535+OR+g2584+OR+g538+OR+g2586+OR+g540+OR+g2588+OR+g541+OR+g543+OR+g544+OR+g545+OR+g546+OR+g548+OR+g2596+OR+g549+OR+g551+OR+g555+OR+g556+OR+g558+OR+g561+OR+g569+OR+g570+OR+g571+OR+g2619+OR+g572+OR+g2620+OR+g573+OR+g2621+OR+g2622+OR+g575+OR+g578+OR+g581+OR+g582+OR+g584+OR+g585+OR+g586+OR+g587+OR+g588+OR+g590+OR+g591+OR+g593+OR+g595+OR+g596+OR+g598+OR+g599+OR+g601+OR+g602+OR+g603+OR+g604+OR+g605+OR+g606+OR+g608+OR+g609+OR+g610+OR+g612+OR+g614+OR+g616+OR+g620+OR+g621+OR+g623+OR+g630+OR+g635+OR+g636+OR+g646+OR+g649+OR+g683+OR+g684+OR+g687+OR+g689+OR+g691+OR+g695+OR+g697+OR+g698+OR+g699+OR+g700+OR+g701+OR+g707+OR+g708+OR+g709+OR+g710+OR+g711+OR+g712+OR+g713+OR+g714+OR+g715+OR+g716+OR+g717+OR+g719+OR+g720+OR+g729+OR+g732+OR+g733+OR+g734+OR+g736+OR+g737+OR+g738+OR+g2786+OR+g752+OR+g754+OR+g2804+OR+g757+OR+g2805+OR+g2806+OR+g760+OR+g761+OR+g2810+OR+g2815+OR+g769+OR+g771+OR+g773+OR+g776+OR+g786+OR+g787+OR+g788+OR+g789+OR+g791+OR+g792+OR+g793+OR+g794+OR+g795+OR+g796+OR+g798+OR+g800+OR+g802+OR+g803+OR+g806+OR+g808+OR+g810+OR+g814+OR+g815+OR+g817+OR+g829+OR+g830+OR+g849+OR+g893+OR+g895+OR+g898+OR+g902+OR+g903+OR+g917+OR+g919+OR+g921+OR+g922+OR+g923+OR+g924+OR+g925+OR+g926+OR+g927+OR+g928+OR+g929+OR+g930+OR+g932+OR+g933+OR+g934+OR+g938+OR+g939+OR+g944+OR+g945+OR+g946+OR+g947+OR+g948+OR+g949+OR+g950+OR+g951+OR+g953+OR+g954+OR+g955+OR+g956+OR+g958+OR+g959+OR+g960+OR+g963+OR+g964+OR+g965+OR+g968+OR+g969+OR+g970+OR+g971+OR+g972+OR+g973+OR+g974+OR+g976+OR+g978+OR+g979+OR+g984+OR+g985+OR+g987+OR+g988+OR+g991+OR+g993+OR+g994+OR+g999+OR+g1000+OR+g1003+OR+g1005+OR+g1006+OR+g1007+OR+g1012+OR+g1013+OR+g1015+OR+g1016+OR+g1018+OR+g1023+OR+g1024+OR+g1026+OR+g1028+OR+g1030+OR+g1032+OR+g1033+OR+g1035+OR+g1036+OR+g1038+OR+g1039+OR+g1041+OR+g1042+OR+g1044+OR+g1045+OR+g1047+OR+g1048+OR+g1050+OR+g1051+OR+g1053+OR+g1054+OR+g1056+OR+g1057+OR+g1058+OR+g1059+OR+g1060+OR+g1061+OR+g1062+OR+g1063+OR+g1064+OR+g1065+OR+g1066+OR+g1068+OR+g1071+OR+g1072+OR+g1074+OR+g1075+OR+g1076+OR+g1077+OR+g1078+OR+g1080+OR+g1081+OR+g1082+OR+g1084+OR+g1085+OR+g1087+OR+g1088+OR+g1089+OR+g1090+OR+g1091+OR+g1092+OR+g1093+OR+g1094+OR+g1095+OR+g1096+OR+g1097+OR+g1106+OR+g1108+OR+g1110+OR+g1112+OR+g1114+OR+g1117+OR+g1120+OR+g1121+OR+g1126+OR+g1128+OR+g1129+OR+g1131+OR+g1136+OR+g1138+OR+g1140+OR+g1141+OR+g1143+OR+g1145+OR+g1146+OR+g1148+OR+g1152+OR+g1154+OR+g1156+OR+g1158+OR+g1159+OR+g1160+OR+g1162+OR+g1163+OR+g1165+OR+g1166+OR+g1168+OR+g1170+OR+g1172+OR+g1175+OR+g1177+OR+g1179+OR+g1181+OR+g1185+OR+g1191+OR+g1193+OR+g1197+OR+g1199+OR+g1201+OR+g1203+OR+g1204+OR+g1215+OR+g1217+OR+g1219+OR+g1221+OR+g1224+OR+g1226+OR+g1227+OR+g1228+OR+g1230+OR+g1231+OR+g1232+OR+g1233+OR+g1234+OR+g1235+OR+g1236+OR+g1237+OR+g1238+OR+g1240+OR+g1241+OR+g1242+OR+g1243+OR+g1244+OR+g1246+OR+g1248+OR+g1250+OR+g1252+OR+g1254+OR+g1256+OR+g1257+OR+g1259+OR+g1261+OR+g1263+OR+g1275+OR+g1276+OR+g1277+OR+g1278+OR+g1279+OR+g1282+OR+g1284+OR+g1288+OR+g1290+OR+g1293+OR+g1296+OR+g1297+OR+g1299+OR+g1303+OR+g1304+OR+g1306+OR+g1309+OR+g1310+OR+g1311+OR+g1312+OR+g1313+OR+g1316+OR+g1318+OR+g1320+OR+g1322+OR+g1323+OR+g1324+OR+g1325+OR+g1326+OR+g1329+OR+g1331+OR+g1347+OR+g1348+OR+g1361+OR+g1362+OR+g1363+OR+g1364+OR+g1367+OR+g1368+OR+g1369+OR+g1370+OR+g1371+OR+g1374+OR+g1376+OR+g1377+OR+g1378+OR+g1380+OR+g1381+OR+g1386+OR+g1389+OR+g1391+OR+g1392+OR+g1393+OR+g1395+OR+g1396+OR+g1397+OR+g1400+OR+g1402+OR+g1406+OR+g1408+OR+g1415+OR+g1417+OR+g1433+OR+g1435+OR+g1441+OR+g1442+OR+g1443+OR+g1444+OR+g1446+OR+g1448+OR+g1450+OR+g1451+OR+g1452+OR+g1453+OR+g1454+OR+g1456+OR+g1458+OR+g1460+OR+g1462+OR+g1464+OR+g1466+OR+g1468+OR+g1470+OR+g1471+OR+g1475+OR+g1476+OR+g1477+OR+g1478+OR+g1479+OR+g1481+OR+g1482+OR+g1483+OR+g1484+OR+g1485+OR+g1486+OR+g1487+OR+g1488+OR+g1489+OR+g1490+OR+g1491+OR+g1492+OR+g1493+OR+g1495+OR+g1497+OR+g1499+OR+g1501+OR+g1503+OR+g1504+OR+g1506+OR+g1508+OR+g1511+OR+g1512+OR+g1513+OR+g1516+OR+g1522+OR+g1535+OR+g1536+OR+g1537+OR+g1539+OR+g1540+OR+g1541+OR+g1542+OR+g1547+OR+g1549+OR+g1551+OR+g1553+OR+g1555+OR+g1557+OR+g1559+OR+g1561+OR+g1563+OR+g1565+OR+g1567+OR+g1569+OR+g1571+OR+g1573+OR+g1580+OR+g1583+OR+g1588+OR+g1590+OR+g1592+OR+g1594+OR+g1595+OR+g1596+OR+g1598+OR+g1599+OR+g1600+OR+g1601+OR+g1602+OR+g1604+OR+g1606+OR+g1610+OR+g1611+OR+g1612+OR+g1613+OR+g1616+OR+g1619+OR+g1622+OR+g1624+OR+g1625+OR+g1626+OR+g1628+OR+g1629+OR+g1631+OR+g1632+OR+g1692+OR+g1694+OR+g1695+OR+g1697+OR+g1705+OR+g1706+OR+g1707+OR+g1708+OR+g1711+OR+g1715+OR+g1717+OR+g1719+OR+g1721+OR+g1722+OR+g1723+OR+g1724+OR+g1725+OR+g1726+OR+g1727+OR+g1731+OR+g1732+OR+g1736+OR+g1737+OR+g1738+OR+g1740+OR+g1742+OR+g1743+OR+g1753+OR+g1755+OR+g1758+OR+g1759+OR+g1764+OR+g1766+OR+g1769+OR+g1774+OR+g1782+OR+g1794+OR+g1796+OR+g1797+OR+g1814+OR+g1818+OR+g1826+OR+g1853+OR+g1855+OR+g1857+OR+g1858+OR+g1859+OR+g1860+OR+g1861+OR+g1863+OR+g1864+OR+g1865+OR+g1867+OR+g1869+OR+g1871+OR+g1873+OR+g1875+OR+g1877+OR+g1879+OR+g1881+OR+g1883+OR+g1884+OR+g1885+OR+g1887+OR+g1889+OR+g1891+OR+g1892+OR+g1894+OR+g1896+OR+g1898+OR+g1900+OR+g1902+OR+g1907+OR+g1910+OR+g1915+OR+g1916+OR+g1917+OR+g1918+OR+g1929+OR+g1931+OR+g1932+OR+g1933+OR+g1934+OR+g1936+OR+g1937+OR+g1938+OR+g1939+OR+g1940+OR+g1942+OR+g1944+OR+g1945+OR+g1948+OR+g1950+OR+g1955+OR+g1961+OR+g1962+OR+g1964+OR+g1966+OR+g1968+OR+g1970+OR+g1972+OR+g1974+OR+g1976+OR+g1979+OR+g1982+OR+g1984+OR+g1985+OR+g1986+OR+g1987+OR+g1989+OR+g1991+OR+g1996+OR+g2003+OR+g2007+OR+g2011+OR+g2019+OR+g2020+OR+g2046)&sort=dateIssued.year_sort+desc&rows=1&wt=javabin&version=2} hits=56080 status=0 QTime=3
-
- -
    -
  • Which, according to some old threads on DSpace Tech, means that the user has a lot of permissions (from groups or on the individual eperson) which increases the Solr query size / query URL
  • -
  • It might be fixed by increasing the Tomcat maxHttpHeaderSize, which is 8192 (or 8KB) by default
  • -
  • I’ve increased the maxHttpHeaderSize to 16384 on DSpace Test and the user said he is now able to see the communities on the homepage
  • -
  • I will make the changes on CGSpace soon
  • -
  • A few users are reporting having issues with their workflows, they get the following message: “You are not allowed to perform this task”
  • -
  • Might be the same as DS-2920 on the bug tracker
  • -
- -

2016-11-30

- -
    -
  • The maxHttpHeaderSize fix worked on CGSpace (user is able to see the community list on the homepage)
  • -
  • The “take task” cache fix worked on DSpace Test but it’s not an official patch, so I’ll have to report the bug to DSpace people and try to get advice
  • -
  • More work on the KM4Dev Journal article
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016-12/index.html b/public/2016-12/index.html deleted file mode 100644 index 90263bb7f..000000000 --- a/public/2016-12/index.html +++ /dev/null @@ -1,979 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - December, 2016 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

December, 2016

- -
-

2016-12-02

- -
    -
  • CGSpace was down for five hours in the morning while I was sleeping
  • -
  • While looking in the logs for errors, I see tons of warnings about Atmire MQM:
  • -
- -
2016-12-02 03:00:32,352 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-
- -
    -
  • I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
  • -
  • I’ve raised a ticket with Atmire to ask
  • -
  • Another worrying error from dspace.log is:
  • -
- -

- -
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
-        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
-        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
-        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
-        at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
-        at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
-        at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
-        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
-        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
-        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
-        at com.googlecode.psiprobe.Tomcat70AgentValve.invoke(Tomcat70AgentValve.java:44)
-        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
-        at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
-        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
-        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
-        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
-        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
-        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
-        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
-        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
-        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
-        at java.lang.Thread.run(Thread.java:745)
-Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
-        at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
-        at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
-        at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
-        at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
-        at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
-        at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
-        at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
-        at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
-        at com.sun.proxy.$Proxy96.process(Unknown Source)
-        at org.apache.cocoon.components.treeprocessor.sitemap.ReadNode.invoke(ReadNode.java:94)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
-        at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
-        at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
-        at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
-        at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
-        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
-        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
-        at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
-        at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
-        at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
-        at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
-        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
-        at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
-        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
-        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
-        at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
-        at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
-        at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
-        at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
-        at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
-        at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
-        at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
-        at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
-        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
-        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
-        at com.sun.proxy.$Proxy89.service(Unknown Source)
-        at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
-        at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
-        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
-        ... 35 more
-
- -
    -
  • The first error I see in dspace.log this morning is:
  • -
- -
2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;"b0b541c1-ec15-48bf-9209-6dbe8e338cdc"
-org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
-
- -
    -
  • Looking through DSpace’s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries
  • -
  • The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:
  • -
- -
2016-12-02 03:00:42,606 INFO  org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&fq=-isInternal:true&fq=-(author_mtdt:"CGIAR\+Institutional\+Learning\+and\+Change\+Initiative"++AND+subject_mtdt:"PARTNERSHIPS"+AND+subject_mtdt:"RESEARCH"+AND+subject_mtdt:"AGRICULTURE"+AND+subject_mtdt:"DEVELOPMENT"++AND+iso_mtdt:"en"+)&rows=0&wt=javabin&version=2} hits=0 status=0 QTime=19
-2016-12-02 08:28:23,908 INFO  org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
-
- -
    -
  • DSpace’s own Solr logs don’t give IP addresses, so I will have to enable Nginx’s logging of /solr so I can see where this request came from
  • -
  • I enabled logging of /rest/ and I think I’ll leave it on for good
  • -
  • Also, the disk is nearly full because of log file issues, so I’m running some compression on DSpace logs
  • -
  • Normally these stay uncompressed for a month just in case we need to look at them, so now I’ve just compressed anything older than 2 weeks so we can get some disk space back
  • -
- -

2016-12-04

- -
    -
  • I got a weird report from the CGSpace checksum checker this morning
  • -
  • It says 732 bitstreams have potential issues, for example:
  • -
- -
------------------------------------------------ 
-Bitstream Id = 6
-Process Start Date = Dec 4, 2016
-Process End Date = Dec 4, 2016
-Checksum Expected = a1d9eef5e2d85f50f67ce04d0329e96a
-Checksum Calculated = a1d9eef5e2d85f50f67ce04d0329e96a
-Result = Bitstream marked deleted in bitstream table
------------------------------------------------ 
-...
------------------------------------------------- 
-Bitstream Id = 77581
-Process Start Date = Dec 4, 2016
-Process End Date = Dec 4, 2016
-Checksum Expected = 9959301aa4ca808d00957dff88214e38
-Checksum Calculated = 
-Result = The bitstream could not be found
------------------------------------------------ 
-
- -
    -
  • The first one seems ok, but I don’t know what to make of the second one…
  • -
  • I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in [dspace-dir]/assetstore/99/59/30/...)
  • -
  • For what it’s worth, there is no item on DSpace Test or S3 backups with that checksum either…
  • -
  • In other news, I’m looking at JVM settings from the Solr 4.10.2 release, from bin/solr.in.sh:
  • -
- -
# These GC settings have shown to work well for a number of common Solr workloads
-GC_TUNE="-XX:-UseSuperWord \
--XX:NewRatio=3 \
--XX:SurvivorRatio=4 \
--XX:TargetSurvivorRatio=90 \
--XX:MaxTenuringThreshold=8 \
--XX:+UseConcMarkSweepGC \
--XX:+UseParNewGC \
--XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
--XX:+CMSScavengeBeforeRemark \
--XX:PretenureSizeThreshold=64m \
--XX:CMSFullGCsBeforeCompaction=1 \
--XX:+UseCMSInitiatingOccupancyOnly \
--XX:CMSInitiatingOccupancyFraction=50 \
--XX:CMSTriggerPermRatio=80 \
--XX:CMSMaxAbortablePrecleanTime=6000 \
--XX:+CMSParallelRemarkEnabled \
--XX:+ParallelRefProcEnabled \
--XX:+AggressiveOpts"
-
- - - -

2016-12-05

- -
    -
  • I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn’t anything amazingly obvious
  • -
  • I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)
  • -
  • Spin up new CGSpace server on Linode
  • -
  • I did a few traceroutes from Jordan and Kenya and it seems that Linode’s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt
  • -
  • Do initial provisioning
  • -
  • Atmire responded about the MQM warnings in the DSpace logs
  • -
  • Apparently we need to change the batch edit consumers in dspace/config/dspace.cfg:
  • -
- -
event.consumer.batchedit.filters = Community|Collection+Create
-
- -
    -
  • I haven’t tested it yet, but I created a pull request: #289
  • -
- -

2016-12-06

- -
    -
  • Some author authority corrections and name standardizations for Peter:
  • -
- -
dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
-UPDATE 11
-dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
-UPDATE 36
-dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
-UPDATE 14
-dspace=# update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
-UPDATE 42
-dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
-UPDATE 360
-dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-UPDATE 561
-
- -
    -
  • Pay attention to the regex to prevent false positives in tricky cases with Dutch names!
  • -
  • I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week
  • -
  • More work on the KM4Dev Journal article
  • -
  • In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
  • -
  • I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
  • -
  • Paola from CCAFS mentioned she also has the “take task” bug on CGSpace
  • -
  • Reading about shared_buffers in PostgreSQL configuration (default is 128MB)
  • -
  • Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
  • -
  • The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB
  • -
  • In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
  • -
- -
$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
-Retrieving all data
-Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
-Exception: null
-java.lang.NullPointerException
-        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
-        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-
-real    8m39.913s
-user    1m54.190s
-sys     0m22.647s
-
- -

2016-12-07

- -
    -
  • For what it’s worth, after running the same SQL updates on my local test server, index-authority runs and completes just fine
  • -
  • I will have to test more
  • -
  • Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”
  • -
  • For example, do a Solr query for “first_name:Grace” and look at the results
  • -
  • Querying that ID shows the fields that need to be changed:
  • -
- -
{
-  "responseHeader": {
-    "status": 0,
-    "QTime": 1,
-    "params": {
-      "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
-      "indent": "true",
-      "wt": "json",
-      "_": "1481102189244"
-    }
-  },
-  "response": {
-    "numFound": 1,
-    "start": 0,
-    "docs": [
-      {
-        "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
-        "field": "dc_contributor_author",
-        "value": "Grace, D.",
-        "deleted": false,
-        "creation_date": "2016-11-10T15:13:40.318Z",
-        "last_modified_date": "2016-11-10T15:13:40.318Z",
-        "authority_type": "person",
-        "first_name": "D.",
-        "last_name": "Grace"
-      }
-    ]
-  }
-}
-
- -
    -
  • I think I can just update the value, first_name, and last_name fields…
  • -
  • The update syntax should be something like this, but I’m getting errors from Solr:
  • -
- -
$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
-{
-  "responseHeader":{
-    "status":400,
-    "QTime":0},
-  "error":{
-    "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
-    "code":400}}
-
- -
    -
  • When I try using the XML format I get an error that the updateLog needs to be configured for that core
  • -
  • Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?
  • -
- -
dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-UPDATE 561
-
- -
    -
  • Then I’ll reindex discovery and authority and see how the authority Solr core looks
  • -
  • After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):
  • -
- -
$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
-{
-  "responseHeader":{
-    "status":0,
-    "QTime":0,
-    "params":{
-      "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
-      "indent":"true",
-      "wt":"json"}},
-  "response":{"numFound":1,"start":0,"docs":[
-      {
-        "id":"18ea1525-2513-430a-8817-a834cd733fbc",
-        "field":"dc_contributor_author",
-        "value":"Grace, Delia",
-        "deleted":false,
-        "creation_date":"2016-12-07T10:54:34.356Z",
-        "last_modified_date":"2016-12-07T10:54:34.356Z",
-        "authority_type":"person",
-        "first_name":"Delia",
-        "last_name":"Grace"}]
-  }}
-
- -
    -
  • So now I could set them all to this ID and the name would be ok, but there has to be a better way!
  • -
  • In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!
  • -
  • Better to use:
  • -
- -
dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-
- -
    -
  • This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!
  • -
  • Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID
  • -
  • Deploy MQM WARN fix on CGSpace (#289)
  • -
  • Deploy “take task” hack/fix on CGSpace (#290)
  • -
  • I ran the following author corrections and then reindexed discovery:
  • -
- -
update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
-update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
-update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
-update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
-update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
-update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-
- -

2016-12-08

- -
    -
  • Something weird happened and Peter Thorne’s names all ended up as “Thorne”, I guess because the original authority had that as its name value:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
-    text_value    |              authority               | confidence
-------------------+--------------------------------------+------------
- Thorne, P.J.     | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
- Thorne           | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
- Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 |         -1
- Thorne, M. D.    | 54c52649-cefd-438d-893f-3bcef3702f07 |         -1
- Thorne, P.J      | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
- Thorne, P.       | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
-(6 rows)
-
- -
    -
  • I generated a new UUID using uuidgen | tr [A-Z] [a-z] and set it along with correct name variation for all records:
  • -
- -
dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
-UPDATE 43
-
- -
    -
  • Apparently we also need to normalize Phil Thornton’s names to Thornton, Philip K.: -
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
-     text_value      |              authority               | confidence
----------------------+--------------------------------------+------------
- Thornton, P         | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, P K.      | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, P K       | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton. P.K.      | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 |         -1
- Thornton, P K .     | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, P.K.      | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, P.K       | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, Philip K  | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
- Thornton, P. K.     | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
-(10 rows)
-
- -
    -
  • Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:
  • -
- -
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
-UPDATE 362
-
- -
    -
  • It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)
  • -
  • Everything looks ok after authority and discovery reindex
  • -
  • In other news, I think we should really be using more RAM for PostgreSQL’s shared_buffers
  • -
  • The PostgreSQL documentation recommends using 25% of the system’s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache
  • -
- -

2016-12-09

- -
    -
  • More work on finishing rough draft of KM4Dev article
  • -
  • Set PostgreSQL’s shared_buffers on CGSpace to 10% of system RAM (1200MB)
  • -
  • Run the following author corrections on CGSpace:
  • -
- -
dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
-dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
-
- -
    -
  • The authority IDs were different now than when I was looking a few days ago so I had to adjust them here
  • -
- -

2016-12-11

- -
    -
  • After enabling a sizable shared_buffers for CGSpace’s PostgreSQL configuration the number of connections to the database dropped significantly
  • -
- -

postgres_bgwriter-week -postgres_connections_ALL-week

- -
    -
  • Looking at CIAT records from last week again, they have a lot of double authors like:
  • -
- -
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
-International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
-International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
-
- -
    -
  • Some in the same dc.contributor.author field, and some in others like dc.contributor.author[en_US] etc
  • -
  • Removing the duplicates in OpenRefine and uploading a CSV to DSpace says “no changes detected”
  • -
  • Seems like the only way to sortof clean these up would be to start in SQL:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
-                  text_value                   |              authority               | confidence
------------------------------------------------+--------------------------------------+------------
- International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 |         -1
- International Center for Tropical Agriculture |                                      |        600
- International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |        500
- International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 |        600
- International Center for Tropical Agriculture |                                      |         -1
- International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 |        500
- International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |        600
- International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |         -1
- International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |          0
-dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
-UPDATE 1693
-dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
-UPDATE 35
-
- -
    -
  • Work on article for KM4Dev journal
  • -
- -

2016-12-13

- -
    -
  • Checking in on CGSpace postgres stats again, looks like the shared_buffers change from a few days ago really made a big impact:
  • -
- -

postgres_bgwriter-week -postgres_connections_ALL-week

- -
    -
  • Looking at logs, it seems we need to evaluate which logs we keep and for how long
  • -
  • Basically the only ones we need are dspace.log because those are used for legacy statistics (need to keep for 1 month)
  • -
  • Other logs will be an issue because they don’t have date stamps
  • -
  • I will add date stamps to the logs we’re storing from the tomcat7 user’s cron jobs at least, using: $(date --iso-8601)
  • -
  • Would probably be better to make custom logrotate files for them in the future
  • -
  • Clean up some unneeded log files from 2014 (they weren’t large, just don’t need them)
  • -
  • So basically, new cron jobs for logs should look something like this:
  • -
  • Find any file named *.log* that isn’t dspace.log*, isn’t already zipped, and is older than one day, and zip it:
  • -
- -
# find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex ".*\.log.*" ! -iregex ".*dspace\.log.*" ! -iregex ".*\.(gz|lrz|lzo|xz)" ! -newermt "Yesterday" -exec schedtool -B -e ionice -c2 -n7 xz {} \;
-
- -
    -
  • Since there is xzgrep and xzless we can actually just zip them after one day, why not?!
  • -
  • We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that
  • -
  • I use schedtool -B and ionice -c2 -n7 to set the CPU scheduling to SCHED_BATCH and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less
  • -
  • When the tasks are running you can see that the policies do apply:
  • -
- -
$ schedtool $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}') && ionice -p $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}')
-PID 17049: PRIO   0, POLICY B: SCHED_BATCH   , NICE   0, AFFINITY 0xf
-best-effort: prio 7
-
- -
    -
  • All in all this should free up a few gigs (we were at 9.3GB free when I started)
  • -
  • Next thing to look at is whether we need Tomcat’s access logs
  • -
  • I just looked and it seems that we saved 10GB by zipping these logs
  • -
  • Some users pointed out issues with the “most popular” stats on a community or collection
  • -
  • This error appears in the logs when you try to view them:
  • -
- -
2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
-org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
-	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
-	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
-	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
-	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
-	at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
-	at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
-	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
-	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-	at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
-	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-	at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
-	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-	at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
-	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
-	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-	at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
-	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
-	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
-	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
-	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
-	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
-	at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
-	at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
-	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
-	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
-	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
-	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
-	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
-	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
-	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
-	at java.lang.Thread.run(Thread.java:745)
-Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
-	at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
-	at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
-	at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
-	at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
-	at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
-	at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
-	at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
-	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-
- -
    -
  • It happens on development and production, so I will have to ask Atmire
  • -
  • Most likely an issue with installation/configuration
  • -
- -

2016-12-14

- -
    -
  • Atmire sent a quick fix for the last-update.txt file not found error
  • -
  • After applying pull request #291 on DSpace Test I no longer see the error in the logs after the UpdateSolrStorageReports task runs
  • -
  • Also, I’m toying with the idea of moving the tomcat7 user’s cron jobs to /etc/cron.d so we can manage them in Ansible
  • -
  • Made a pull request with a template for the cron jobs (#75)
  • -
  • Testing SMTP from the new CGSpace server and it’s not working, I’ll have to tell James
  • -
- -

2016-12-15

- -
    -
  • Start planning for server migration this weekend, letting users know
  • -
  • I am trying to figure out what the process is to update the server’s IP in the Handle system, and emailing the hdladmin account bounces(!)
  • -
  • I will contact the Jane Euler directly as I know I’ve corresponded with her in the past
  • -
  • She said that I should indeed just re-run the [dspace]/bin/dspace make-handle-config command and submit the new sitebndl.zip file to the CNRI website
  • -
  • Also I was troubleshooting some workflow issues from Bizuwork
  • -
  • I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it
  • -
  • So it turns out it’s not a bug, it’s just that Peter was added as a reviewer/admin AFTER the items were submitted
  • -
  • This is how DSpace works, and I need to ask if there is a way to override someone’s submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?
  • -
  • Run a batch edit to add “RANGELANDS” ILRI subject to all items containing the word “RANGELANDS” in their metadata for Peter Ballantyne
  • -
- -

Select all items with "rangelands" in metadata -Add RANGELANDS ILRI subject

- -

2016-12-18

- -
    -
  • Add four new CRP subjects for 2017 and sort the input forms alphabetically (#294)
  • -
  • Test the SMTP on the new server and it’s working
  • -
  • Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately
  • -
  • We quickly told them to undo it, but I just realized they didn’t undo the IPv6 AAAA record!
  • -
  • None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then
  • -
  • Update some names and authorities in the database:
  • -
- -
dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
-UPDATE 204
-dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%';
-UPDATE 89
-dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
-UPDATE 140
-
- -
    -
  • Generated a new UUID for Ben using uuidgen | tr [A-Z] [a-z] as the one in Solr had his ORCID but the name format was incorrect
  • -
  • In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn’t work (see Jira bug DS-3302)
  • -
  • I need to run these updates along with the other one for CIAT that I found last week
  • -
  • Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (#76)
  • -
  • Working for DSpace Test on the second response:
  • -
- -
$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
-...
-OCSP response: no response sent
-$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
-...
-OCSP Response Data:
-...
-    Cert Status: good
-
- -
    -
  • Migrate CGSpace to new server, roughly following these steps:
  • -
  • On old server:
  • -
- -
# service tomcat7 stop
-# /home/backup/scripts/postgres_backup.sh
-
- -
    -
  • On new server:
  • -
- -
# systemctl stop tomcat7
-# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/
-# rsync -4 -av --delete 178.79.187.182:/home/backup/ /home/backup/
-# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/solr/ /home/cgspace.cgiar.org/solr
-# su - postgres
-$ dropdb cgspace
-$ createdb -O cgspace --encoding=UNICODE cgspace
-$ psql cgspace -c 'alter user cgspace createuser;'
-$ pg_restore -O -U cgspace -d cgspace -W -h localhost /home/backup/postgres/cgspace_2016-12-18.backup
-$ psql cgspace -c 'alter user cgspace nocreateuser;'
-$ psql -U cgspace -f ~tomcat7/src/git/DSpace/dspace/etc/postgres/update-sequences.sql cgspace -h localhost
-$ vacuumdb cgspace
-$ psql cgspace
-postgres=# \i /tmp/author-authority-updates-2016-12-11.sql
-postgres=# \q
-$ exit
-# chown -R tomcat7:tomcat7 /home/cgspace.cgiar.org
-# rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/*.dat /home/cgspace.cgiar.org/log/
-# rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/dspace.log.2016-1[12]* /home/cgspace.cgiar.org/log/
-# su - tomcat7
-$ cd src/git/DSpace/dspace/target/dspace-installer
-$ ant update clean_backups
-$ exit
-# systemctl start tomcat7
-
- -
    -
  • It took about twenty minutes and afterwards I had to check a few things, like: - -
      -
    • check and enable systemd timer for let’s encrypt
    • -
    • enable root cron jobs
    • -
    • disable root cron jobs on old server after!
    • -
    • enable tomcat7 cron jobs
    • -
    • disable tomcat7 cron jobs on old server after!
    • -
    • regenerate sitebndl.zip with new IP for handle server and submit it to Handle.net
    • -
  • -
- -

2016-12-22

- -
    -
  • Abenet wanted a CSV of the IITA community, but the web export doesn’t include the dc.date.accessioned field
  • -
  • I had to export it from the command line using the -a flag:
  • -
- -
$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
-
- -

2016-12-28

- -
    -
  • We’ve been getting two alerts per day about CPU usage on the new server from Linode
  • -
  • These are caused by the batch jobs for Solr etc that run in the early morning hours
  • -
  • The Linode default is to alert at 90% CPU usage for two hours, but I see the old server was at 150%, so maybe we just need to adjust it
  • -
  • Speaking of the old server (linode01), I think we can decommission it now
  • -
  • I checked the S3 logs on the new server (linode18) to make sure the backups have been running and everything looks good
  • -
  • In other news, I was looking at the Munin graphs for PostgreSQL on the new server and it looks slightly worrying:
  • -
- -

munin postgres stats

- -
    -
  • I will have to check later why the size keeps increasing
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2016/01/xmlui-subjects-after.png b/public/2016/01/xmlui-subjects-after.png deleted file mode 100644 index bce717356..000000000 Binary files a/public/2016/01/xmlui-subjects-after.png and /dev/null differ diff --git a/public/2016/01/xmlui-subjects-before.png b/public/2016/01/xmlui-subjects-before.png deleted file mode 100644 index 3664c12e6..000000000 Binary files a/public/2016/01/xmlui-subjects-before.png and /dev/null differ diff --git a/public/2016/02/cgspace-countries.png b/public/2016/02/cgspace-countries.png deleted file mode 100644 index 0070f09c8..000000000 Binary files a/public/2016/02/cgspace-countries.png and /dev/null differ diff --git a/public/2016/02/submit-button-drylands.png b/public/2016/02/submit-button-drylands.png deleted file mode 100644 index eea073186..000000000 Binary files a/public/2016/02/submit-button-drylands.png and /dev/null differ diff --git a/public/2016/02/submit-button-ilri.png b/public/2016/02/submit-button-ilri.png deleted file mode 100644 index 7b2064904..000000000 Binary files a/public/2016/02/submit-button-ilri.png and /dev/null differ diff --git a/public/2016/03/bioversity-thumbnail-bad.jpg b/public/2016/03/bioversity-thumbnail-bad.jpg deleted file mode 100644 index 53e7ed35f..000000000 Binary files a/public/2016/03/bioversity-thumbnail-bad.jpg and /dev/null differ diff --git a/public/2016/03/bioversity-thumbnail-good.jpg b/public/2016/03/bioversity-thumbnail-good.jpg deleted file mode 100644 index 02a6e8313..000000000 Binary files a/public/2016/03/bioversity-thumbnail-good.jpg and /dev/null differ diff --git a/public/2016/03/cua-label-mixup.png b/public/2016/03/cua-label-mixup.png deleted file mode 100644 index faff0b308..000000000 Binary files a/public/2016/03/cua-label-mixup.png and /dev/null differ diff --git a/public/2016/03/google-index.png b/public/2016/03/google-index.png deleted file mode 100644 index a02c5fe96..000000000 Binary files a/public/2016/03/google-index.png and /dev/null differ diff --git a/public/2016/03/missing-xmlui-string.png b/public/2016/03/missing-xmlui-string.png deleted file mode 100644 index 7119dcfd3..000000000 Binary files a/public/2016/03/missing-xmlui-string.png and /dev/null differ diff --git a/public/2016/03/url-parameters.png b/public/2016/03/url-parameters.png deleted file mode 100644 index 27aeb1e6d..000000000 Binary files a/public/2016/03/url-parameters.png and /dev/null differ diff --git a/public/2016/03/url-parameters2.png b/public/2016/03/url-parameters2.png deleted file mode 100644 index 39ab4d681..000000000 Binary files a/public/2016/03/url-parameters2.png and /dev/null differ diff --git a/public/2016/05/discovery-types.png b/public/2016/05/discovery-types.png deleted file mode 100644 index 5652cb554..000000000 Binary files a/public/2016/05/discovery-types.png and /dev/null differ diff --git a/public/2016/06/xmlui-altmetric-sharing.png b/public/2016/06/xmlui-altmetric-sharing.png deleted file mode 100644 index 594cf3dd7..000000000 Binary files a/public/2016/06/xmlui-altmetric-sharing.png and /dev/null differ diff --git a/public/2016/07/cgspace-about-page.png b/public/2016/07/cgspace-about-page.png deleted file mode 100644 index 483fc860e..000000000 Binary files a/public/2016/07/cgspace-about-page.png and /dev/null differ diff --git a/public/2016/08/dspace55-ubuntu16.04.png b/public/2016/08/dspace55-ubuntu16.04.png deleted file mode 100644 index 0026da85c..000000000 Binary files a/public/2016/08/dspace55-ubuntu16.04.png and /dev/null differ diff --git a/public/2016/08/nodejs-nginx.png b/public/2016/08/nodejs-nginx.png deleted file mode 100644 index 077b174f2..000000000 Binary files a/public/2016/08/nodejs-nginx.png and /dev/null differ diff --git a/public/2016/09/cgspace-search.png b/public/2016/09/cgspace-search.png deleted file mode 100644 index 5321987bc..000000000 Binary files a/public/2016/09/cgspace-search.png and /dev/null differ diff --git a/public/2016/09/dspacetest-search.png b/public/2016/09/dspacetest-search.png deleted file mode 100644 index c085aec6d..000000000 Binary files a/public/2016/09/dspacetest-search.png and /dev/null differ diff --git a/public/2016/09/google-webmaster-tools-index.png b/public/2016/09/google-webmaster-tools-index.png deleted file mode 100644 index bf5aa6e20..000000000 Binary files a/public/2016/09/google-webmaster-tools-index.png and /dev/null differ diff --git a/public/2016/09/ilri-ldap-users.png b/public/2016/09/ilri-ldap-users.png deleted file mode 100644 index 39ebd3766..000000000 Binary files a/public/2016/09/ilri-ldap-users.png and /dev/null differ diff --git a/public/2016/09/tomcat_jvm-day.png b/public/2016/09/tomcat_jvm-day.png deleted file mode 100644 index 5eedce2a1..000000000 Binary files a/public/2016/09/tomcat_jvm-day.png and /dev/null differ diff --git a/public/2016/09/tomcat_jvm-month.png b/public/2016/09/tomcat_jvm-month.png deleted file mode 100644 index 2dae49337..000000000 Binary files a/public/2016/09/tomcat_jvm-month.png and /dev/null differ diff --git a/public/2016/09/tomcat_jvm-week.png b/public/2016/09/tomcat_jvm-week.png deleted file mode 100644 index 9e9b24fab..000000000 Binary files a/public/2016/09/tomcat_jvm-week.png and /dev/null differ diff --git a/public/2016/10/bootstrap-issue.png b/public/2016/10/bootstrap-issue.png deleted file mode 100644 index bf8c73b64..000000000 Binary files a/public/2016/10/bootstrap-issue.png and /dev/null differ diff --git a/public/2016/10/cgspace-icons.png b/public/2016/10/cgspace-icons.png deleted file mode 100644 index f2053e6d7..000000000 Binary files a/public/2016/10/cgspace-icons.png and /dev/null differ diff --git a/public/2016/10/cmyk-vs-srgb.jpg b/public/2016/10/cmyk-vs-srgb.jpg deleted file mode 100644 index 2ff62cef6..000000000 Binary files a/public/2016/10/cmyk-vs-srgb.jpg and /dev/null differ diff --git a/public/2016/10/dspacetest-fontawesome-icons.png b/public/2016/10/dspacetest-fontawesome-icons.png deleted file mode 100644 index 594cc948d..000000000 Binary files a/public/2016/10/dspacetest-fontawesome-icons.png and /dev/null differ diff --git a/public/2016/11/dspacetest-tomcat-jvm-day.png b/public/2016/11/dspacetest-tomcat-jvm-day.png deleted file mode 100644 index 422b70166..000000000 Binary files a/public/2016/11/dspacetest-tomcat-jvm-day.png and /dev/null differ diff --git a/public/2016/11/dspacetest-tomcat-jvm-week.png b/public/2016/11/dspacetest-tomcat-jvm-week.png deleted file mode 100644 index 6dcd2c253..000000000 Binary files a/public/2016/11/dspacetest-tomcat-jvm-week.png and /dev/null differ diff --git a/public/2016/11/listings-and-reports-55.png b/public/2016/11/listings-and-reports-55.png deleted file mode 100644 index d0dcbaad0..000000000 Binary files a/public/2016/11/listings-and-reports-55.png and /dev/null differ diff --git a/public/2016/11/listings-and-reports.png b/public/2016/11/listings-and-reports.png deleted file mode 100644 index 33709af07..000000000 Binary files a/public/2016/11/listings-and-reports.png and /dev/null differ diff --git a/public/2016/12/batch-edit1.png b/public/2016/12/batch-edit1.png deleted file mode 100644 index dd97e271b..000000000 Binary files a/public/2016/12/batch-edit1.png and /dev/null differ diff --git a/public/2016/12/batch-edit2.png b/public/2016/12/batch-edit2.png deleted file mode 100644 index b6a49aad3..000000000 Binary files a/public/2016/12/batch-edit2.png and /dev/null differ diff --git a/public/2016/12/postgres_bgwriter-week-2016-12-13.png b/public/2016/12/postgres_bgwriter-week-2016-12-13.png deleted file mode 100644 index f3e8357af..000000000 Binary files a/public/2016/12/postgres_bgwriter-week-2016-12-13.png and /dev/null differ diff --git a/public/2016/12/postgres_bgwriter-week.png b/public/2016/12/postgres_bgwriter-week.png deleted file mode 100644 index 2abcbcaf0..000000000 Binary files a/public/2016/12/postgres_bgwriter-week.png and /dev/null differ diff --git a/public/2016/12/postgres_connections_ALL-week-2016-12-13.png b/public/2016/12/postgres_connections_ALL-week-2016-12-13.png deleted file mode 100644 index 0373d7002..000000000 Binary files a/public/2016/12/postgres_connections_ALL-week-2016-12-13.png and /dev/null differ diff --git a/public/2016/12/postgres_connections_ALL-week.png b/public/2016/12/postgres_connections_ALL-week.png deleted file mode 100644 index fc9cd3276..000000000 Binary files a/public/2016/12/postgres_connections_ALL-week.png and /dev/null differ diff --git a/public/2016/12/postgres_size_ALL-week.png b/public/2016/12/postgres_size_ALL-week.png deleted file mode 100644 index e2a6dabec..000000000 Binary files a/public/2016/12/postgres_size_ALL-week.png and /dev/null differ diff --git a/public/2017-01/index.html b/public/2017-01/index.html deleted file mode 100644 index 93095eabc..000000000 --- a/public/2017-01/index.html +++ /dev/null @@ -1,524 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - January, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

January, 2017

- -
-

2017-01-02

- -
    -
  • I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
  • -
  • I tested on DSpace Test as well and it doesn’t work there either
  • -
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • -
- -

- -

2017-01-04

- -
    -
  • I tried to shard my local dev instance and it fails the same way:
  • -
- -
$ JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace stats-util -s
-Moving: 9318 into core statistics-2016
-Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
-org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
-        at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2291)
-        at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-Caused by: org.apache.http.client.ClientProtocolException
-        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
-        ... 10 more
-Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
-        at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
-        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
-        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
-        ... 14 more
-Caused by: java.net.SocketException: Broken pipe (Write failed)
-        at java.net.SocketOutputStream.socketWrite0(Native Method)
-        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
-        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
-        at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
-        at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
-        at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
-        at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
-        at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
-        at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
-        at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
-        at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
-        at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
-        at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
-        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
-        at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
-        ... 16 more
-
- -
    -
  • And the DSpace log shows:
  • -
- -
2017-01-04 22:39:05,412 INFO  org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
-2017-01-04 22:39:05,412 INFO  org.dspace.statistics.SolrLogger @ Moving: 9318 records into core statistics-2016
-2017-01-04 22:39:07,310 INFO  org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:8081: Broken pipe (Write failed)
-2017-01-04 22:39:07,310 INFO  org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}->http://localhost:8081
-
- -
    -
  • Despite failing instantly, a statistics-2016 directory was created, but it only has a data dir (no conf)
  • -
  • The Tomcat access logs show more:
  • -
- -
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
-127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-17YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 423
-127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 77
-127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63
-127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "GET /solr/statistics/select?csv.mv.separator=%7C&q=*%3A*&fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&rows=10000&wt=csv HTTP/1.1" 200 4359517
-127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "GET /solr/statistics/admin/luke?show=schema&wt=javabin&version=2 HTTP/1.1" 200 16248
-127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156
-127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] "POST /solr/datatables/update?wt=javabin&version=2 HTTP/1.1" 200 41
-127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] "POST /solr/datatables/update HTTP/1.1" 200 40
-
- -
    -
  • Very interesting… it creates the core and then fails somehow
  • -
- -

2017-01-08

- -
    -
  • Put Sisay’s item-view.xsl code to show mapped collections on CGSpace (#295)
  • -
- -

2017-01-09

- -
    -
  • A user wrote to tell me that the new display of an item’s mappings had a crazy bug for at least one item: https://cgspace.cgiar.org/handle/10568/78596
  • -
  • She said she only mapped it once, but it appears to be mapped 184 times
  • -
- -

Crazy item mapping

- -

2017-01-10

- -
    -
  • I tried to clean up the duplicate mappings by exporting the item’s metadata to CSV, editing, and re-importing, but DSpace said “no changes were detected”
  • -
  • I’ve asked on the dspace-tech mailing list to see if anyone can help
  • -
  • I found an old post on the mailing list discussing a similar issue, and listing some SQL commands that might help
  • -
  • For example, this shows 186 mappings for the item, the first three of which are real:
  • -
- -
dspace=#  select * from collection2item where item_id = '80596';
-
- -
    -
  • Then I deleted the others:
  • -
- -
dspace=# delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
-
- -
    -
  • And in the item view it now shows the correct mappings
  • -
  • I will have to ask the DSpace people if this is a valid approach
  • -
  • Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it
  • -
- -

2017-01-11

- - - -
Traceback (most recent call last):
-  File "./fix-metadata-values.py", line 80, in <module>
-    print("Fixing {} occurences of: {}".format(records_to_fix, record[0]))
-UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
-
- -
    -
  • Seems we need to encode as UTF-8 before printing to screen, ie:
  • -
- -
print("Fixing {} occurences of: {}".format(records_to_fix, record[0].encode('utf-8')))
-
- -
    -
  • See: http://stackoverflow.com/a/36427358/487333
  • -
  • I’m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database… I’ve never had this issue before
  • -
  • Now back to cleaning up some journal titles so we can make the controlled vocabulary:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • Now get the top 500 journal titles:
  • -
- -
dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
-
- -
    -
  • The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November
  • -
  • I will have to go through these and fix some more before making the controlled vocabulary
  • -
  • Added 30 more corrections or so, now there are 49 total and I’ll have to get the top 500 after applying them
  • -
- -

2017-01-13

- - - -

2017-01-16

- -
    -
  • Fix the two items Maria found with duplicate mappings with this script:
  • -
- -
/* 184 in correct mappings: https://cgspace.cgiar.org/handle/10568/78596 */
-delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
-/* 1 incorrect mapping: https://cgspace.cgiar.org/handle/10568/78658 */
-delete from collection2item where id = '91082';
-
- -

2017-01-17

- -
    -
  • Helping clean up some file names in the 232 CIAT records that Sisay worked on last week
  • -
  • There are about 30 files with %20 (space) and Spanish accents in the file name
  • -
  • At first I thought we should fix these, but actually it is prescribed by the W3 working group to convert these to UTF8 and URL encode them!
  • -
  • And the file names don’t really matter either, as long as the SAF Builder tool can read them—after that DSpace renames them with a hash in the assetstore
  • -
  • Seems like the only ones I should replace are the ' apostrophe characters, as %27:
  • -
- -
value.replace("'",'%27')
-
- -
    -
  • Add the item’s Type to the filename column as a hint to SAF Builder so it can set a more useful description field:
  • -
- -
value + "__description:" + cells["dc.type"].value
-
- -
    -
  • Test importing of the new CIAT records (actually there are 232, not 234):
  • -
- -
$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
-
- -
    -
  • Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB
  • -
  • These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:
  • -
- -
$ convert -compress Zip -density 150x150 input.pdf output.pdf
-$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
-
- -
    -
  • Somewhere on the Internet suggested using a DPI of 144
  • -
- -

2017-01-19

- -
    -
  • In testing a random sample of CIAT’s PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are
  • -
  • Import 232 CIAT records into CGSpace:
  • -
- -
$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/68704 --source /home/aorth/CIAT_232/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
-
- -

2017-01-22

- -
    -
  • Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel’s CSV exporter)
  • -
  • There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full
  • -
- -

2017-01-23

- -
    -
  • I merged Atmire’s pull request into the development branch so they can deploy it on DSpace Test
  • -
  • Move some old ILRI Program communities to a new subcommunity for former programs (1056879164):
  • -
- -
$ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/230 10568/32724 10568/172; do /home/cgspace.cgiar.org/bin/dspace community-filiator --remove --parent=10568/27866 --child="$community" && /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/79164 --child="$community"; done
-
- - - -
10568/42161 10568/171 10568/79341
-10568/41914 10568/171 10568/79340
-
- -

2017-01-24

- -
    -
  • Run all updates on DSpace Test and reboot the server
  • -
  • Run fixes for Journal titles on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
-
- -
    -
  • Create a new list of the top 500 journal titles from the database:
  • -
- -
dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
-
- -
    -
  • Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (#298)
  • -
  • This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (#69)
  • -
- -

2017-01-25

- -
    -
  • Atmire says the com.atmire.statistics.util.UpdateSolrStorageReports and com.atmire.utils.ReportSender are no longer necessary because they are using a Spring scheduler for these tasks now
  • -
  • Pull request to remove them from the Ansible templates: https://github.com/ilri/rmg-ansible-public/pull/80
  • -
  • Still testing the Atmire modules on DSpace Test, and it looks like a few issues we had reported are now fixed: - -
      -
    • XLS Export from Content statistics
    • -
    • Most popular items
    • -
    • Show statistics on collection pages
    • -
  • -
  • But now we have a new issue with the “Types” in Content statistics not being respected—we only get the defaults, despite having custom settings in dspace/config/modules/atmire-cua.cfg
  • -
- -

2017-01-27

- -
    -
  • Magdalena pointed out that somehow the Anonymous group had been added to the Administrators group on CGSpace (!)
  • -
  • Discuss plans to update CCAFS metadata and communities for their new flagships and phase II project identifiers
  • -
  • The flagships are in cg.subject.ccafs, and we need to probably make a new field for the phase II project identifiers
  • -
- -

2017-01-28

- -
    -
  • Merge controlled vocabulary for journal titles (dc.source) into CGSpace (#298)
  • -
  • Merge new CIAT subject into CGSpace (#296)
  • -
- -

2017-01-29

- -
    -
  • Run all system updates on DSpace Test, redeploy DSpace code, and reboot the server
  • -
  • Run all system updates on CGSpace, redeploy DSpace code, and reboot the server
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-02/index.html b/public/2017-02/index.html deleted file mode 100644 index 54e6364dc..000000000 --- a/public/2017-02/index.html +++ /dev/null @@ -1,596 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - February, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

February, 2017

- -
-

2017-02-07

- -
    -
  • An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
  • -
- -
dspace=# select * from collection2item where item_id = '80278';
-  id   | collection_id | item_id
--------+---------------+---------
- 92551 |           313 |   80278
- 92550 |           313 |   80278
- 90774 |          1051 |   80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-
- -
    -
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • -
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • -
- -

- -

2017-02-08

- -
    -
  • We also need to rename some of the CCAFS Phase I flagships: - -
      -
    • CLIMATE-SMART AGRICULTURAL PRACTICES → CLIMATE-SMART TECHNOLOGIES AND PRACTICES
    • -
    • CLIMATE RISK MANAGEMENT → CLIMATE SERVICES AND SAFETY NETS
    • -
    • LOW EMISSIONS AGRICULTURE → LOW EMISSIONS DEVELOPMENT
    • -
    • POLICIES AND INSTITUTIONS → PRIORITIES AND POLICIES FOR CSA
    • -
  • -
  • The climate risk management one doesn’t exist, so I will have to ask Magdalena if they want me to add it to the input forms
  • -
  • Start testing some nearly 500 author corrections that CCAFS sent me:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
-
- -

2017-02-09

- -
    -
  • More work on CCAFS Phase II stuff
  • -
  • Looks like simply adding a new metadata field to dspace/config/registries/cgiar-types.xml and restarting DSpace causes the field to get added to the rregistry
  • -
  • It requires a restart but at least it allows you to manage the registry programmatically
  • -
  • It’s not a very good way to manage the registry, though, as removing one there doesn’t cause it to be removed from the registry, and we always restore from database backups so there would never be a scenario when we needed these to be created
  • -
  • Testing some corrections on CCAFS Phase II flagships (cg.subject.ccafs):
  • -
- -
$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
-
- -

2017-02-10

- -
    -
  • CCAFS said they want to wait on the flagship updates (cg.subject.ccafs) on CGSpace, perhaps for a month or so
  • -
  • Help Marianne Gadeberg (WLE) with some user permissions as it seems she had previously been using a personal email account, and is now on a CGIAR one
  • -
  • I manually added her new account to ~25 authorizations that her hold user was on
  • -
- -

2017-02-14

- -
    -
  • Add SCALING to ILRI subjects (#304), as Sisay’s attempts were all sloppy
  • -
  • Cherry pick some patches from the DSpace 5.7 branch: - -
      -
    • DS-3363 CSV import error says “row”, means “column”: f7b6c83e991db099003ee4e28ca33d3c7bab48c0
    • -
    • DS-3479 avoid adding empty metadata values during import: 329f3b48a6de7fad074d825fd12118f7e181e151
    • -
    • [DS-3456] 5x Clarify command line options for statisics import/export tools (#1623): 567ec083c8a94eb2bcc1189816eb4f767745b278
    • -
    • [DS-3458]5x Allow Shard Process to Append to an existing repo: 3c8ecb5d1fd69a1dcfee01feed259e80abbb7749
    • -
  • -
  • I still need to test these, especially as the last two which change some stuff with Solr maintenance
  • -
- -

2017-02-15

- - - -

2017-02-16

- -
    -
  • Looking at memory info from munin on CGSpace:
  • -
- -

CGSpace meminfo

- -
    -
  • We are using only ~8GB of RAM for applications, and 16GB for caches!
  • -
  • The Linode machine we’re on has 24GB of RAM but only because that’s the only instance that had enough disk space for us (384GB)…
  • -
  • We should probably look into Google Compute Engine or Digital Ocean where we can get more storage without having to follow a linear increase in instance pricing for CPU/memory as well
  • -
  • Especially because we only use 2 out of 8 CPUs basically:
  • -
- -

CGSpace CPU

- -
    -
  • Fix issue with duplicate declaration of in atmire-dspace-xmlui pom.xml (causing non-fatal warnings during the maven build)
  • -
  • Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site’s properties file:
  • -
- -
handle.canonical.prefix = https://hdl.handle.net/
-
- -
    -
  • And then a SQL command to update existing records:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri');
-UPDATE 58193
-
- -
    -
  • Seems to work fine!
  • -
  • I noticed a few items that have incorrect DOI links (dc.identifier.doi), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:
  • -
- -
dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%';
-
- -
    -
  • This will replace any that begin with 10. and change them to https://dx.doi.org/10.:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%';
-
- -
    -
  • This will get any that begin with doi:10. and change them to https://dx.doi.org/10.x:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%';
-
- -
    -
  • Fix DOIs like dx.doi.org/10. to be https://dx.doi.org/10.:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%';
-
- -
    -
  • Fix DOIs like http//:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%';
-
- -
    -
  • Fix DOIs like dx.doi.org./:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%'
-
-
- -
    -
  • Delete some invalid DOIs:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb');
-
- -
    -
  • Fix some other random outliers:
  • -
- -
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003';
-dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.5337/2016.200' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'doi: https://dx.doi.org/10.5337/2016.200';
-dspace=# update metadatavalue set text_value = 'https://dx.doi.org/doi:10.1371/journal.pone.0062898' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Http://dx.doi.org/doi:10.1371/journal.pone.0062898';
-dspace=# update metadatavalue set text_value = 'https://dx.doi.10.1016/j.cosust.2013.11.012' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:dx.doi.10.1016/j.cosust.2013.11.012';
-dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1080/03632415.2014.883570' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'org/10.1080/03632415.2014.883570';
-dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agron.colomb.v32n3.46052' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Doi: 10.15446/agron.colomb.v32n3.46052';
-
- -
    -
  • And do another round of http:// → https:// cleanups:
  • -
- -
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
-
- -
    -
  • Run all DOI corrections on CGSpace
  • -
  • Something to think about here is to write a Curation Task in Java to do these sanity checks / corrections every night
  • -
  • Then we could add a cron job for them and run them from the command line like:
  • -
- -
[dspace]/bin/dspace curate -t noop -i 10568/79891
-
- -

2017-02-20

- -
    -
  • Run all system updates on DSpace Test and reboot the server
  • -
  • Run CCAFS author corrections on DSpace Test and CGSpace and force a full discovery reindex
  • -
  • Fix label of CCAFS subjects in Atmire Listings and Reports module
  • -
  • Help Sisay with SQL commands
  • -
  • Help Paola from CCAFS with the Atmire Listings and Reports module
  • -
  • Testing the fix-metadata-values.py script on macOS and it seems like we don’t need to use .encode('utf-8') anymore when printing strings to the screen
  • -
  • It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string “Entwicklung & Ländlicher Raum” without the encode() call, but print it as a bytes when it is used:
  • -
- -
$ python
-Python 3.6.0 (default, Dec 25 2016, 17:30:53)
->>> print('Entwicklung & Ländlicher Raum')
-Entwicklung & Ländlicher Raum
->>> print('Entwicklung & Ländlicher Raum'.encode())
-b'Entwicklung & L\xc3\xa4ndlicher Raum'
-
- -
    -
  • So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really was a temporary problem, perhaps due to macOS or the Python build I was using.
  • -
- -

2017-02-21

- -
    -
  • Testing regenerating PDF thumbnails, like I started in 2016-11
  • -
  • It seems there is a bug in filter-media that causes it to process formats that aren’t part of its configuration:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p "ImageMagick PDF Thumbnail"
-File: earlywinproposal_esa_postharvest.pdf.jpg
-FILTERED: bitstream 13787 (item: 10568/16881) and created 'earlywinproposal_esa_postharvest.pdf.jpg'
-File: postHarvest.jpg.jpg
-FILTERED: bitstream 16524 (item: 10568/24655) and created 'postHarvest.jpg.jpg'
-
- -
    -
  • According to dspace.cfg the ImageMagick PDF Thumbnail plugin should only process PDFs:
  • -
- -
filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
-filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF
-
- -
    -
  • I’ve sent a message to the mailing list and might file a Jira issue
  • -
  • Ask Atmire about the failed interpolation of the dspace.internalUrl variable in atmire-cua.cfg
  • -
- -

2017-02-22

- -
    -
  • Atmire said I can add dspace.internalUrl to my build properties and the error will go away
  • -
  • It should be the local URL for accessing Tomcat from the server’s own perspective, ie: http://localhost:8080
  • -
- -

2017-02-26

- -
    -
  • Find all fields with “http://hdl.handle.net" values (most are in dc.identifier.uri, but some are in other URL-related fields like cg.link.reference, cg.identifier.dataurl, and cg.identifier.url):
  • -
- -
dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
-dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
-UPDATE 58633
-
- -
    -
  • This works but I’m thinking I’ll wait on the replacement as there are perhaps some other places that rely on http://hdl.handle.net (grep the code, it’s scary how many things are hard coded)
  • -
  • Send message to dspace-tech mailing list with concerns about this
  • -
- -

2017-02-27

- -
    -
  • LDAP users cannot log in today, looks to be an issue with CGIAR’s LDAP server:
  • -
- -
$ openssl s_client -connect svcgroot2.cgiarad.org:3269
-CONNECTED(00000003)
-depth=0 CN = SVCGROOT2.CGIARAD.ORG
-verify error:num=20:unable to get local issuer certificate
-verify return:1
-depth=0 CN = SVCGROOT2.CGIARAD.ORG
-verify error:num=21:unable to verify the first certificate
-verify return:1
----
-Certificate chain
- 0 s:/CN=SVCGROOT2.CGIARAD.ORG
-   i:/CN=CGIARAD-RDWA-CA
----
-
- -
    -
  • For some reason it is now signed by a private certificate authority
  • -
  • This error seems to have started on 2017-02-25:
  • -
- -
$ grep -c "unable to find valid certification path" [dspace]/log/dspace.log.2017-02-*
-[dspace]/log/dspace.log.2017-02-01:0
-[dspace]/log/dspace.log.2017-02-02:0
-[dspace]/log/dspace.log.2017-02-03:0
-[dspace]/log/dspace.log.2017-02-04:0
-[dspace]/log/dspace.log.2017-02-05:0
-[dspace]/log/dspace.log.2017-02-06:0
-[dspace]/log/dspace.log.2017-02-07:0
-[dspace]/log/dspace.log.2017-02-08:0
-[dspace]/log/dspace.log.2017-02-09:0
-[dspace]/log/dspace.log.2017-02-10:0
-[dspace]/log/dspace.log.2017-02-11:0
-[dspace]/log/dspace.log.2017-02-12:0
-[dspace]/log/dspace.log.2017-02-13:0
-[dspace]/log/dspace.log.2017-02-14:0
-[dspace]/log/dspace.log.2017-02-15:0
-[dspace]/log/dspace.log.2017-02-16:0
-[dspace]/log/dspace.log.2017-02-17:0
-[dspace]/log/dspace.log.2017-02-18:0
-[dspace]/log/dspace.log.2017-02-19:0
-[dspace]/log/dspace.log.2017-02-20:0
-[dspace]/log/dspace.log.2017-02-21:0
-[dspace]/log/dspace.log.2017-02-22:0
-[dspace]/log/dspace.log.2017-02-23:0
-[dspace]/log/dspace.log.2017-02-24:0
-[dspace]/log/dspace.log.2017-02-25:7
-[dspace]/log/dspace.log.2017-02-26:8
-[dspace]/log/dspace.log.2017-02-27:90
-
- -
    -
  • Also, it seems that we need to use a different user for LDAP binds, as we’re still using the temporary one from the root migration, so maybe we can go back to the previous user we were using
  • -
  • So it looks like the certificate is invalid AND the bind users we had been using were deleted
  • -
  • Biruk Debebe recreated the bind user and now we are just waiting for CGNET to update their certificates
  • -
  • Regarding the filter-media issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the “Content Files” (aka ORIGINAL) bundle
  • -
  • The problem likely lies in the logic of ImageMagickThumbnailFilter.java, as ImageMagickPdfThumbnailFilter.java extends it
  • -
  • Run CIAT corrections on CGSpace
  • -
- -
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
-
- -
    -
  • CGNET has fixed the certificate chain on their LDAP server
  • -
  • Redeploy CGSpace and DSpace Test to on latest 5_x-prod branch with fixes for LDAP bind user
  • -
  • Run all system updates on CGSpace server and reboot
  • -
- -

2017-02-28

- -
    -
  • After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery
  • -
  • Ah, this is probably because some items have the International Center for Tropical Agriculture author twice, which I first noticed in 2016-12 but couldn’t figure out how to fix
  • -
  • I think I can do it by first exporting all metadatavalues that have the author International Center for Tropical Agriculture
  • -
- -
dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
-COPY 1968
-
- -
    -
  • And then use awk to print the duplicate lines to a separate file:
  • -
- -
$ awk -F',' 'seen[$1]++' /tmp/ciat.csv > /tmp/ciat-dupes.csv
-
- -
    -
  • From that file I can create a list of 279 deletes and put them in a batch script like:
  • -
- -
delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
-
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-03/index.html b/public/2017-03/index.html deleted file mode 100644 index b4a29d4ac..000000000 --- a/public/2017-03/index.html +++ /dev/null @@ -1,510 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - March, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

March, 2017

- -
-

2017-03-01

- -
    -
  • Run the 279 CIAT author corrections on CGSpace
  • -
- -

2017-03-02

- -
    -
  • Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
  • -
  • CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
  • -
  • They might come in at the top level in one “CGIAR System” community, or with several communities
  • -
  • I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
  • -
  • Need to send Peter and Michael some notes about this in a few days
  • -
  • Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
  • -
  • Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
  • -
  • Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
  • -
  • Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 1056851999):
  • -
- -
$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-
- -

- -
    -
  • This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:
  • -
- -

Thumbnail in sRGB colorspace

- -

Thumbnial in CMYK colorspace

- -
    -
  • I filed an issue for the color space thing: DS-3517
  • -
- -

2017-03-03

- - - -
$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg
-
- -
    -
  • This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file
  • -
  • Note that you should set the first profile immediately after the input file
  • -
  • Also, it is better to use profiles than setting -colorspace
  • -
  • This is a great resource describing the color stuff: http://www.imagemagick.org/Usage/formats/#profiles
  • -
  • Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)
  • -
  • This is trivial with identify (even by the Java ImageMagick API):
  • -
- -
$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\]
-DirectClass CMYK
-$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
-DirectClass sRGB Alpha
-
- -

2017-03-04

- -
    -
  • Spent more time looking at the ImageMagick CMYK issue
  • -
  • The default_cmyk.icc and default_rgb.icc files are both part of the Ghostscript GPL distribution, but according to DSpace’s LICENSES_THIRD_PARTY file, DSpace doesn’t allow distribution of dependencies that are licensed solely under the GPL
  • -
  • So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion
  • -
- -

2017-03-05

- -
    -
  • Look into helping developers from landportal.info with a query for items related to LAND on the REST API
  • -
  • They want something like the items that are returned by the general “LAND” query in the search interface, but we cannot do that
  • -
  • We can only return specific results for metadata fields, like:
  • -
- -
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "LAND REFORM", "language": null}' | json_pp
-
- - - -
# List any additional prefixes that need to be managed by this handle server
-# (as for examle handle prefix coming from old dspace repository merged in
-# that repository)
-# handle.additional.prefixes = prefix1[, prefix2]
-
- -
    -
  • Because of this I noticed that our Handle server’s config.dct was potentially misconfigured!
  • -
  • We had some default values still present:
  • -
- -
"300:0.NA/YOUR_NAMING_AUTHORITY"
-
- -
    -
  • I’ve changed them to the following and restarted the handle server:
  • -
- -
"300:0.NA/10568"
-
- -
    -
  • In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk
  • -
  • From dspace/config/crosswalks/google-metadata.properties:
  • -
- -
google.citation_doi = cg.identifier.doi
-
- -
    -
  • This works, and makes DSpace output the following metadata on the item view page:
  • -
- -
<meta content="https://dx.doi.org/10.1186/s13059-017-1153-y" name="citation_doi">
-
- - - -

2017-03-06

- -
    -
  • Someone on the mailing list said that handle.plugin.checknameauthority should be false if we’re using multiple handle prefixes
  • -
- -

2017-03-07

- -
    -
  • I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix
  • -
  • When testing the Handle resolver locally it shows the item to be on the local repository
  • -
  • So this seems to work, with the following caveats: - -
      -
    • New items will have the default handle
    • -
    • Communities and collections will have the default handle
    • -
    • Only items imported manually can have the other handles
    • -
  • -
  • I need to talk to Michael and Peter to share the news, and discuss the structure of their community(s) and try some actual test data
  • -
  • We’ll need to do some data cleaning to make sure they are using the same fields we are, like dc.type and cg.identifier.status
  • -
  • Another thing is that the import process creates new dc.date.accessioned and dc.date.available fields, so we end up with duplicates (is it important to preserve the originals for these?)
  • -
  • Report DS-3520 issue to Atmire
  • -
- -

2017-03-08

- -
    -
  • Merge the author separator changes to 5_x-prod, as everyone has responded positively about it, and it’s the default in Mirage2 afterall!
  • -
  • Cherry pick the commons-collections patch from DSpace’s dspace-5_x branch to address DS-3520: https://jira.duraspace.org/browse/DS-3520
  • -
- -

2017-03-09

- -
    -
  • Export list of sponsors so Peter can clean it up:
  • -
- -
dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
-COPY 285
-
- -

2017-03-12

- -
    -
  • Test the sponsorship fixes and deletes from Peter:
  • -
- -
$ ./fix-metadata-values.py -i Investors-Fix-51.csv -f dc.description.sponsorship -t Action -m 29 -d dspace -u dspace -p fuuuu
-$ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
-
- -
    -
  • Generate a new list of unique sponsors so we can update the controlled vocabulary:
  • -
- -
dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship')) to /tmp/sponsorship.csv with csv;
-
- - - -

Livestock CRP theme

- -

2017-03-15

- - - -

2017-03-16

- -
    -
  • Merge pull request for PABRA subjects: https://github.com/ilri/DSpace/pull/310
  • -
  • Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now
  • -
  • Help Sisay with RTB theme again
  • -
  • Remove ICARDA subject from Discovery sidebar facets: https://github.com/ilri/DSpace/pull/312
  • -
  • Remove ICARDA subject from Browse and item submission form: https://github.com/ilri/DSpace/pull/313
  • -
  • Merge the CCAFS Phase II changes but hold off on doing the flagship metadata updates until Macaroni Bros gets their importer updated
  • -
  • Deploy latest changes and investor fixes/deletions on CGSpace
  • -
  • Run system updates on CGSpace and reboot server
  • -
- -

2017-03-20

- - - -

2017-03-24

- -
    -
  • Still helping Sisay try to figure out how to create a theme for the RTB community
  • -
- -

2017-03-28

- -
    -
  • CCAFS said they are ready for the flagship updates for Phase II to be run (cg.subject.ccafs), so I ran them on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
-
- -
    -
  • We’ve been waiting since February to run these
  • -
  • Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=210 group by text_value order by count desc) to /tmp/ccafs.csv with csv;
-
- - - -

2017-03-29

- -
    -
  • Dump a list of fields in the DC and CG schemas to compare with CG Core:
  • -
- -
dspace=# select case when metadata_schema_id=1 then 'dc' else 'cg' end as schema, element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
-
- -
    -
  • Ooh, a better one!
  • -
- -
dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
-
- -

2017-03-30

- -
    -
  • Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150–190%, so this should make the alerts less regular
  • -
  • Adjust the threshold for DSpace Test from 90 to 100%
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-04/index.html b/public/2017-04/index.html deleted file mode 100644 index 9ea1d2804..000000000 --- a/public/2017-04/index.html +++ /dev/null @@ -1,789 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - April, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

April, 2017

- -
-

2017-04-02

- -
    -
  • Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): https://github.com/ilri/DSpace/pull/317
  • -
  • Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
  • -
- -

dc.rights in the submission form

- -
    -
  • Remove redundant/duplicate text in the DSpace submission license
  • -
  • Testing the CMYK patch on a collection with 650 items:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
- -

- -

2017-04-03

- -
    -
  • Continue testing the CMYK patch on more communities:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/1 -p "ImageMagick PDF Thumbnail" -v >> /tmp/filter-media-cmyk.txt 2>&1
-
- -
    -
  • So far there are almost 500:
  • -
- -
$ grep -c profile /tmp/filter-media-cmyk.txt
-484
-
- -
    -
  • Looking at the CG Core document again, I’ll send some feedback to Peter and Abenet: - -
      -
    • We use cg.contributor.crp to indicate the CRP(s) affiliated with the item
    • -
    • DSpace has dc.date.available, but this field isn’t particularly meaningful other than as an automatic timestamp at the time of item accession (and is identical to dc.date.accessioned)
    • -
    • dc.relation exists in CGSpace, but isn’t used—rather dc.relation.ispartofseries, which is used ~5,000 times to Series name and number within that series
    • -
  • -
  • Also, I’m noticing some weird outliers in cg.coverage.region, need to remember to go correct these later:
  • -
- -
dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=227;
-
- -

2017-04-04

- -
    -
  • The filter-media script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:
  • -
- -
$ grep -c profile /tmp/filter-media-cmyk.txt
-1584
-
- -
    -
  • Trying to find a way to get the number of items submitted by a certain user in 2016
  • -
  • It’s not possible in the DSpace search / module interfaces, but might be able to be derived from dc.description.provenance, as that field contains the name and email of the submitter/approver, ie:
  • -
- -
Submitted by Francesca Giampieri (fgiampieri) on 2016-01-19T13:56:43Z^M
-No. of bitstreams: 1^M
-ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0 (MD5)
-
- -
    -
  • This SQL query returns fields that were submitted or approved by giampieri in 2016 and contain a “checksum” (ie, there was a bitstream in the submission):
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
-
- -
    -
  • Then this one does the same, but for fields that don’t contain checksums (ie, there was no bitstream in the submission):
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*' and text_value !~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
-
- -
    -
  • For some reason there seem to be way too many fields, for example there are 498 + 13 here, which is 511 items for just this one user.
  • -
  • It looks like there can be a scenario where the user submitted AND approved it, so some records might be doubled…
  • -
  • In that case it might just be better to see how many the user submitted (both with and without bitstreams):
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*giampieri.*2016-.*';
-
- -

2017-04-05

- -
    -
  • After doing a few more large communities it seems this is the final count of CMYK PDFs:
  • -
- -
$ grep -c profile /tmp/filter-media-cmyk.txt
-2505
-
- -

2017-04-06

- -
    -
  • After reading the notes for DCAT April 2017 I am testing some new settings for PostgreSQL on DSpace Test: - -
      -
    • db.maxconnections 30→70 (the default PostgreSQL config allows 100 connections, so DSpace’s default of 30 is quite low)
    • -
    • db.maxwait 5000→10000
    • -
    • db.maxidle 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)
    • -
  • -
  • I need to look at the Munin graphs after a few days to see if the load has changed
  • -
  • Run system updates on DSpace Test and reboot the server
  • -
  • Discussing harvesting CIFOR’s DSpace via OAI
  • -
  • Sisay added their OAI as a source to a new collection, but using the Simple Dublin Core method, so many fields are unqualified and duplicated
  • -
  • Looking at the documentation it seems that we probably want to be using DSpace Intermediate Metadata
  • -
- -

2017-04-10

- -
    -
  • Adjust Linode CPU usage alerts on DSpace servers - -
      -
    • CGSpace from 200 to 250%
    • -
    • DSpace Test from 100 to 150%
    • -
  • -
  • Remove James from Linode access
  • -
  • Look into having CIFOR use a sub prefix of 10568 like 10568.01
  • -
  • Handle.net calls this “derived prefixes” and it seems this would work with DSpace if we wanted to go that route
  • -
  • CIFOR is starting to test aligning their metadata more with CGSpace/CG core
  • -
  • They shared a test item which is using cg.coverage.country, cg.subject.cifor, dc.subject, and dc.date.issued
  • -
  • Looking at their OAI I’m not sure it has updated as I don’t see the new fields: https://data.cifor.org/dspace/oai/request?verb=ListRecords&resumptionToken=oai_dc///col_11463_6/900
  • -
  • Maybe they need to make sure they are running the OAI cache refresh cron job, or maybe OAI doesn’t export these?
  • -
  • I added cg.subject.cifor to the metadata registry and I’m waiting for the harvester to re-harvest to see if it picks up more data now
  • -
  • Another possiblity is that we could use a cross walk… but I’ve never done it.
  • -
- -

2017-04-11

- -
    -
  • Looking at the item from CIFOR it hasn’t been updated yet, maybe they aren’t running the cron job
  • -
  • I emailed Usman from CIFOR to ask if he’s running the cron job
  • -
- -

2017-04-12

- - - -

stale metadata in OAI

- - - -
$ /home/dspacetest.cgiar.org/bin/dspace oai import -c
-...
-63900 items imported so far...
-64000 items imported so far...
-Total: 64056 items
-Purging cached OAI responses.
-OAI 2.0 manager action ended. It took 829 seconds.
-
- -
    -
  • After reading some threads on the DSpace mailing list, I see that clean-cache is actually only for caching responses, ie to client requests in the OAI web application
  • -
  • These are stored in [dspace]/var/oai/requests/
  • -
  • The import command should theoretically catch situations like this where an item’s metadata was updated, but in this case we changed the metadata schema and it doesn’t seem to catch it (could be a bug!)
  • -
  • Attempting a full rebuild of OAI on CGSpace:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace oai import -c
-...
-58700 items imported so far...
-Total: 58789 items
-Purging cached OAI responses.
-OAI 2.0 manager action ended. It took 1032 seconds.
-
-real    17m20.156s
-user    4m35.293s
-sys     1m29.310s
-
- - - -

2017-04-13

- -
    -
  • Checking the CIFOR item on DSpace Test, it still doesn’t have the new metadata
  • -
  • The collection status shows this message from the harvester:
  • -
- -
-

Last Harvest Result: OAI server did not contain any updates on 2017-04-13 02:19:47.964

-
- -
    -
  • I don’t know why there were no updates detected, so I will reset and reimport the collection
  • -
  • Usman has set up a custom crosswalk called dimcg that now shows CG and CIFOR metadata namespaces, but we can’t use it because DSpace can only harvest DIM by default (from the harvesting user interface)
  • -
  • Also worth noting that the REST interface exposes all fields in the item, including CG and CIFOR fields: https://data.cifor.org/dspace/rest/items/944?expand=metadata
  • -
  • After re-importing the CIFOR collection it looks very good!
  • -
  • It seems like they have done a full metadata migration with dc.date.issued and cg.coverage.country etc
  • -
  • Submit pull request to upstream DSpace for the PDF thumbnail bug (DS-3516): https://github.com/DSpace/DSpace/pull/1709
  • -
- -

2017-04-14

- - - -

2017-04-17

- -
    -
  • CIFOR has now implemented a new “cgiar” context in their OAI that exposes CG fields, so I am re-harvesting that to see how it looks in the Discovery sidebars and searches
  • -
  • See: https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&metadataPrefix=dim&identifier=oai:data.cifor.org:11463/947
  • -
  • One thing we need to remember if we start using OAI is to enable the autostart of the harvester process (see harvester.autoStart in dspace/config/modules/oai.cfg)
  • -
  • Error when running DSpace cleanup task on DSpace Test and CGSpace (on the same item), I need to look this up:
  • -
- -
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
-  Detail: Key (bitstream_id)=(435) is still referenced from table "bundle".
-
- -

2017-04-18

- - - -
$ git clone https://github.com/ilri/ckm-cgspace-rest-api.git
-$ cd ckm-cgspace-rest-api/app
-$ gem install bundler
-$ bundle
-$ cd ..
-$ rails -s
-
- -
    -
  • I used Ansible to create a PostgreSQL user that only has SELECT privileges on the tables it needs:
  • -
- -
$ ansible linode02 -u aorth -b --become-user=postgres -K -m postgresql_user -a 'db=database name=username password=password priv=CONNECT/item:SELECT/metadatavalue:SELECT/metadatafieldregistry:SELECT/metadataschemaregistry:SELECT/collection:SELECT/handle:SELECT/bundle2bitstream:SELECT/bitstream:SELECT/bundle:SELECT/item2bundle:SELECT state=present
-
- - - -
$ bundle binstubs puma --path ./sbin
-
- -

2017-04-19

- -
    -
  • Usman sent another link to their OAI interface, where the country names are now capitalized: https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&metadataPrefix=dim&identifier=oai:data.cifor.org:11463/947
  • -
  • Looking at the same item in XMLUI, the countries are not capitalized: https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full
  • -
  • So it seems he did it in the crosswalk!
  • -
  • Keep working on Ansible stuff for deploying the CKM REST API
  • -
  • We can use systemd’s Environment stuff to pass the database parameters to Rails
  • -
  • Abenet noticed that the “Workflow Statistics” option is missing now, but we have screenshots from a presentation in 2016 when it was there
  • -
  • I filed a ticket with Atmire
  • -
  • Looking at 933 CIAT records from Sisay, he’s having problems creating a SAF bundle to import to DSpace Test
  • -
  • I started by looking at his CSV in OpenRefine, and I see there a bunch of fields with whitespace issues that I cleaned up:
  • -
- -
value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
-
- -
    -
  • Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:
  • -
- -
unescape(value,"url")
-
- -
    -
  • Then create the filename column using the following transform from URL:
  • -
- -
value.split('/')[-1].replace(/#.*$/,"")
-
- -
    -
  • The replace part is because some URLs have an anchor like #page=14 which we obviously don’t want on the filename
  • -
  • Also, we need to only use the PDF on the item corresponding with page 1, so we don’t end up with literally hundreds of duplicate PDFs
  • -
  • Alternatively, I could export each page to a standalone PDF…
  • -
- -

2017-04-20

- -
    -
  • Atmire responded about the Workflow Statistics, saying that it had been disabled because many environments needed customization to be useful
  • -
  • I re-enabled it with a hidden config key workflow.stats.enabled = true on DSpace Test and will evaluate adding it on CGSpace
  • -
  • Looking at the CIAT data again, a bunch of items have metadata values ending in ||, which might cause blank fields to be added at import time
  • -
  • Cleaning them up with OpenRefine:
  • -
- -
value.replace(/\|\|$/,"")
-
- -
    -
  • Working with the CIAT data in OpenRefine to remove the filename column from all but the first item which requires a particular PDF, as there are many items pointing to the same PDF, which would cause hundreds of duplicates to be added if we included them in the SAF bundle
  • -
  • I did some massaging in OpenRefine, flagging duplicates with stars and flags, then filtering and removing the filenames of those items
  • -
- -

Flagging and filtering duplicates in OpenRefine

- -
    -
  • Also there are loads of whitespace errors in almost every field, so I trimmed leading/trailing whitespace
  • -
  • Unbelievable, there are also metadata values like:
  • -
- -
COLLETOTRICHUM LINDEMUTHIANUM||                  FUSARIUM||GERMPLASM
-
- -
    -
  • Add a description to the file names using:
  • -
- -
value + "__description:" + cells["dc.type"].value
-
- -
    -
  • Test import of 933 records:
  • -
- -
$ [dspace]/bin/dspace import -a -e aorth@mjanja.ch -c 10568/87193 -s /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ -m /tmp/ciat
-$ wc -l /tmp/ciat
-933 /tmp/ciat
-
- -
    -
  • Run system updates on CGSpace and reboot server
  • -
  • This includes switching nginx to using upstream with keepalive instead of direct proxy_pass
  • -
  • Re-deploy CGSpace to latest 5_x-prod, including the PABRA and RTB XMLUI themes, as well as the PDF processing and CMYK changes
  • -
  • More work on Ansible infrastructure stuff for Tsega’s CKM DSpace REST API
  • -
  • I’m going to start re-processing all the PDF thumbnails on CGSpace, one community at a time:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
- -

2017-04-22

- -
    -
  • Someone on the dspace-tech mailing list responded with a suggestion about the foreign key violation in the cleanup task
  • -
  • The solution is to remove the ID (ie set to NULL) from the primary_bitstream_id column in the bundle table
  • -
  • After doing that and running the cleanup task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:
  • -
- -
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1136, 1132, 1220, 1236, 3002, 3255, 5322);
-
- -

2017-04-24

- -
    -
  • Two users mentioned some items they recently approved not showing up in the search / XMLUI
  • -
  • I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:
  • -
- -
2017-04-24 00:00:15,578 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (55 of 58853): 70590
-2017-04-24 00:00:15,586 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (56 of 58853): 74507
-2017-04-24 00:00:15,614 ERROR com.atmire.dspace.discovery.AtmireSolrService @ this IndexWriter is closed
-org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this IndexWriter is closed
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
-        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
-        at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285)
-        at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271)
-        at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:331)
-        at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:315)
-        at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:803)
-        at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
-        at org.dspace.discovery.IndexClient.main(IndexClient.java:127)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-
- -
    -
  • Looking at the past few days of logs, it looks like the indexing process started crashing on 2017-04-20:
  • -
- -
# grep -c 'IndexWriter is closed' [dspace]/log/dspace.log.2017-04-*
-[dspace]/log/dspace.log.2017-04-01:0
-[dspace]/log/dspace.log.2017-04-02:0
-[dspace]/log/dspace.log.2017-04-03:0
-[dspace]/log/dspace.log.2017-04-04:0
-[dspace]/log/dspace.log.2017-04-05:0
-[dspace]/log/dspace.log.2017-04-06:0
-[dspace]/log/dspace.log.2017-04-07:0
-[dspace]/log/dspace.log.2017-04-08:0
-[dspace]/log/dspace.log.2017-04-09:0
-[dspace]/log/dspace.log.2017-04-10:0
-[dspace]/log/dspace.log.2017-04-11:0
-[dspace]/log/dspace.log.2017-04-12:0
-[dspace]/log/dspace.log.2017-04-13:0
-[dspace]/log/dspace.log.2017-04-14:0
-[dspace]/log/dspace.log.2017-04-15:0
-[dspace]/log/dspace.log.2017-04-16:0
-[dspace]/log/dspace.log.2017-04-17:0
-[dspace]/log/dspace.log.2017-04-18:0
-[dspace]/log/dspace.log.2017-04-19:0
-[dspace]/log/dspace.log.2017-04-20:2293
-[dspace]/log/dspace.log.2017-04-21:5992
-[dspace]/log/dspace.log.2017-04-22:13278
-[dspace]/log/dspace.log.2017-04-23:22720
-[dspace]/log/dspace.log.2017-04-24:21422
-
- -
    -
  • I restarted Tomcat and re-ran the discovery process manually:
  • -
- -
[dspace]/bin/dspace index-discovery
-
- -
    -
  • Now everything is ok
  • -
  • Finally finished manually running the cleanup task over and over and null’ing the conflicting IDs:
  • -
- -
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1132, 1136, 1220, 1236, 3002, 3255, 5322, 5098, 5982, 5897, 6245, 6184, 4927, 6070, 4925, 6888, 7368, 7136, 7294, 7698, 7864, 10799, 10839, 11765, 13241, 13634, 13642, 14127, 14146, 15582, 16116, 16254, 17136, 17486, 17824, 18098, 22091, 22149, 22206, 22449, 22548, 22559, 22454, 22253, 22553, 22897, 22941, 30262, 33657, 39796, 46943, 56561, 58237, 58739, 58734, 62020, 62535, 64149, 64672, 66988, 66919, 76005, 79780, 78545, 81078, 83620, 84492, 92513, 93915);
-
- -
    -
  • Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it’s likely we haven’t had a cleanup task complete successfully in years…
  • -
- -

2017-04-25

- -
    -
  • Finally finished running the PDF thumbnail re-processing on CGSpace, the final count of CMYK PDFs is about 2751
  • -
  • Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:
  • -
- -
# find [dspace]/assetstore/ -type f | wc -l
-113104
-
- -
    -
  • Troubleshooting the Atmire Solr update process that runs at 3:00 AM every morning, after finishing at 100% it has this error:
  • -
- -
[=================================================> ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
-[=================================================> ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
-[=================================================> ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
-[=================================================> ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:13
-[==================================================>]100% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:13
-java.lang.RuntimeException: java.lang.ClassNotFoundException: org.dspace.statistics.content.SpecifiedDSODatasetGenerator
-	at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:254)
-	at org.dspace.statistics.content.StatisticsDisplay.<init>(SourceFile:203)
-	at com.atmire.statistics.display.StatisticsGraph.<init>(SourceFile:116)
-	at com.atmire.statistics.display.StatisticsGraphFactory.getStatisticsDisplay(SourceFile:25)
-	at com.atmire.statistics.display.StatisticsDisplayFactory.parseStatisticsDisplay(SourceFile:67)
-	at com.atmire.statistics.display.StatisticsDisplayFactory.getStatisticsDisplays(SourceFile:49)
-	at com.atmire.statistics.statlet.XmlParser.getStatisticsDisplays(SourceFile:178)
-	at com.atmire.statistics.statlet.XmlParser.getStatisticsDisplays(SourceFile:111)
-	at com.atmire.utils.ReportSender$ReportRunnable.run(SourceFile:151)
-	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
-	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
-	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
-	at java.lang.Thread.run(Thread.java:745)
-Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.SpecifiedDSODatasetGenerator
-	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1858)
-	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1701)
-	at java.lang.Class.forName0(Native Method)
-	at java.lang.Class.forName(Class.java:264)
-	at com.atmire.statistics.statlet.XmlParser.parsedatasetGenerator(SourceFile:299)
-	at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:250)
-	... 13 more
-java.lang.RuntimeException: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpaceObjectDatasetGenerator
-	at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:254)
-	at org.dspace.statistics.content.StatisticsDisplay.<init>(SourceFile:203)
-	at com.atmire.statistics.display.StatisticsGraph.<init>(SourceFile:116)
-	at com.atmire.statistics.display.StatisticsGraphFactory.getStatisticsDisplay(SourceFile:25)
-	at com.atmire.statistics.display.StatisticsDisplayFactory.parseStatisticsDisplay(SourceFile:67)
-	at com.atmire.statistics.display.StatisticsDisplayFactory.getStatisticsDisplays(SourceFile:49)
-	at com.atmire.statistics.statlet.XmlParser.getStatisticsDisplays(SourceFile:178)
-	at com.atmire.statistics.statlet.XmlParser.getStatisticsDisplays(SourceFile:111)
-	at com.atmire.utils.ReportSender$ReportRunnable.run(SourceFile:151)
-	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
-	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
-	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
-	at java.lang.Thread.run(Thread.java:745)
-Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpaceObjectDatasetGenerator
-	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1858)
-	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1701)
-	at java.lang.Class.forName0(Native Method)
-	at java.lang.Class.forName(Class.java:264)
-	at com.atmire.statistics.statlet.XmlParser.parsedatasetGenerator(SourceFile:299)
-	at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:250)
-
- -
    -
  • Run system updates on DSpace Test and reboot the server (new Java 8 131)
  • -
  • Run the SQL cleanups on the bundle table on CGSpace and run the [dspace]/bin/dspace cleanup task
  • -
  • I will be interested to see the file count in the assetstore as well as the database size after the next backup (last backup size is 111M)
  • -
  • Final file count after the cleanup task finished: 77843
  • -
  • So that is 30,000 files, and about 7GB
  • -
  • Add logging to the cleanup cron task
  • -
- -

2017-04-26

- -
    -
  • The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though
  • -
  • Update RVM’s Ruby from 2.3.0 to 2.4.0 on DSpace Test:
  • -
- -
$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
-$ \curl -sSL https://raw.githubusercontent.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable --ruby
-... reload shell to get new Ruby
-$ gem install sass -v 3.3.14
-$ gem install compass -v 1.0.3
-
- -
    -
  • Help Tsega re-deploy the ckm-cgspace-rest-api on DSpace Test
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-05/index.html b/public/2017-05/index.html deleted file mode 100644 index 9133041a5..000000000 --- a/public/2017-05/index.html +++ /dev/null @@ -1,543 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - May, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

May, 2017

- -
- - -

2017-05-01

- - - -

2017-05-02

- -
    -
  • Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request
  • -
- -

2017-05-04

- -
    -
  • Sync DSpace Test with database and assetstore from CGSpace
  • -
  • Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server
  • -
  • Now I can see the workflow statistics and am able to select users, but everything returns 0 items
  • -
  • Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b
  • -
  • Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.cgiar.org/handle/10568/80731
  • -
- -

2017-05-05

- -
    -
  • Discovered that CGSpace has ~700 items that are missing the cg.identifier.status field
  • -
  • Need to perhaps try using the “required metadata” curation task to find fields missing these items:
  • -
- -
$ [dspace]/bin/dspace curate -t requiredmetadata -i 10568/1 -r - > /tmp/curation.out
-
- -
    -
  • It seems the curation task dies when it finds an item which has missing metadata
  • -
- -

2017-05-06

- - - -

2017-05-07

- -
    -
  • Testing one replacement for CCAFS Flagships (cg.subject.ccafs), first changed in the submission forms, and then in the database:
  • -
- -
$ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
-
- -
    -
  • Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones
  • -
  • Waiting for feedback from CCAFS, then I can merge #320
  • -
- -

2017-05-08

- -
    -
  • Start working on CGIAR Library migration
  • -
  • We decided to use AIP export to preserve the hierarchies and handles of communities and collections
  • -
  • When ingesting some collections I was getting java.lang.OutOfMemoryError: GC overhead limit exceeded, which can be solved by disabling the GC timeout with -XX:-UseGCOverheadLimit
  • -
  • Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed
  • -
  • This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using dspace cleanup -v, or else you’ll run out of disk space
  • -
  • In the end I realized it’s better to use submission mode (-s) to ingest the community object as a single AIP without its children, followed by each of the collections:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
-$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
-$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
-$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
-
- - - -

2017-05-09

- -
    -
  • The CGIAR Library metadata has some blank metadata values, which leads to ||| in the Discovery facets
  • -
  • Clean these up in the database using:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
-
- -
    -
  • I ended up running into issues during data cleaning and decided to wipe out the entire community and re-sync DSpace Test assetstore and database from CGSpace rather than waiting for the cleanup task to clean up
  • -
  • Hours into the re-ingestion I ran into more errors, and had to erase everything and start over again!
  • -
  • Now, no matter what I do I keep getting foreign key errors…
  • -
- -
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"
-  Detail: Key (handle_id)=(80928) already exists.
-
- -
    -
  • I think those errors actually come from me running the update-sequences.sql script while Tomcat/DSpace are running
  • -
  • Apparently you need to stop Tomcat!
  • -
- -

2017-05-10

- -
    -
  • Atmire says they are willing to extend the ORCID implementation, and I’ve asked them to provide a quote
  • -
  • I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields
  • -
  • Finally finished importing all the CGIAR Library content, final method was:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit"
-$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2517/10947-2517.zip
-$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2515/10947-2515.zip
-$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2516/10947-2516.zip
-$ [dspace]/bin/dspace packager -s -t AIP -o ignoreHandle=false -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip
-$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
-$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
-
- -
    -
  • Basically, import the smaller communities using recursive AIP import (with skipIfParentMissing)
  • -
  • Then, for the larger collection, create the community, collections, and items separately, ingesting the items one by one
  • -
  • The -XX:-UseGCOverheadLimit JVM option helps with some issues in large imports
  • -
  • After this I ran the update-sequences.sql script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
-
- -

2017-05-13

- -
    -
  • After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually NUL characters in the dc.description.abstract field (at least) on the lines where CSV importing was failing
  • -
  • I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it
  • -
  • The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before
  • -
- -

2017-05-15

- -
    -
  • To delete the blank lines that cause isses during import we need to use a regex in vim g/^$/d
  • -
  • After that I started looking in the dc.subject field to try to pull countries and regions out, but there are too many values in there
  • -
  • Bump the Academicons dependency of the Mirage 2 themes from 1.6.0 to 1.8.0 because the upstream deleted the old tag and now the build is failing: #321
  • -
  • Merge changes to CCAFS project identifiers and flagships: #320
  • -
  • Run updates for CCAFS flagships on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • These include:

    - -
      -
    • GENDER AND SOCIAL DIFFERENTIATION→GENDER AND SOCIAL INCLUSION
    • -
    • MANAGING CLIMATE RISK→CLIMATE SERVICES AND SAFETY NETS
    • -
  • - -
  • Re-deploy CGSpace and DSpace Test and run system updates

  • - -
  • Reboot DSpace Test

  • - -
  • Fix cron jobs for log management on DSpace Test, as they weren’t catching dspace.log.* files correctly and we had over six months of them and they were taking up many gigs of disk space

  • -
- -

2017-05-16

- -
    -
  • Discuss updates to WLE themes for their Phase II
  • -
  • Make an issue to track the changes to cg.subject.wle: #322
  • -
- -

2017-05-17

- -
    -
  • Looking into the error I get when trying to create a new collection on DSpace Test:
  • -
- -
ERROR: duplicate key value violates unique constraint "handle_pkey" Detail: Key (handle_id)=(84834) already exists.
-
- -
    -
  • I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn’t helped
  • -
  • It appears item with handle_id 84834 is one of the imported CGIAR Library items:
  • -
- -
dspace=# select * from handle where handle_id=84834;
- handle_id |   handle   | resource_type_id | resource_id
------------+------------+------------------+-------------
-     84834 | 10947/1332 |                2 |       87113
-
- -
    -
  • Looks like the max handle_id is actually much higher:
  • -
- -
dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
- handle_id |  handle  | resource_type_id | resource_id
------------+----------+------------------+-------------
-     86873 | 10947/99 |                2 |       89153
-(1 row)
-
- -
    -
  • I’ve posted on the dspace-test mailing list to see if I can just manually set the handle_seq to that value
  • -
  • Actually, it seems I can manually set the handle sequence using:
  • -
- -
dspace=# select setval('handle_seq',86873);
-
- -
    -
  • After that I can create collections just fine, though I’m not sure if it has other side effects
  • -
- -

2017-05-21

- -
    -
  • Start creating a basic theme for the CGIAR System Organization’s community on CGSpace
  • -
  • Using colors from the CGIAR Branding guidelines (2014)
  • -
  • Make a GitHub issue to track this work: #324
  • -
- -

2017-05-22

- -
    -
  • Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested
  • -
  • Peter wanted a list of authors in here, so I generated a list of collections using the “View Source” on each community and this hacky awk:
  • -
- -
$ grep 10947/ /tmp/collections | grep -v cocoon | awk -F/ '{print $3"/"$4}' | awk -F\" '{print $1}' | vim -
-
- -
    -
  • Then I joined them together and ran this old SQL query from the dspace-tech mailing list which gives you authors for items in those collections:
  • -
- -
dspace=# select distinct text_value
-from metadatavalue
-where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
-AND resource_type_id = 2
-AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/1
-0', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '109
-47/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947
-/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947
-/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521',
-'10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '109
-47/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2
-531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535'
-, '10947/2537', '10568/93761')));
-
- -
    -
  • To get a CSV (with counts) from that:
  • -
- -
dspace=# \copy (select distinct text_value, count(*)
-from metadatavalue
-where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
-AND resource_type_id = 2
-AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/10', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '10947/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521', '10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '10947/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535', '10947/2537', '10568/93761'))) group by text_value order by count desc) to /tmp/cgiar-librar-authors.csv with csv;
-
- -

2017-05-23

- -
    -
  • Add Affiliation to filters on Listing and Reports module (#325)
  • -
  • Start looking at WLE’s Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!
  • -
  • For now I’ve suggested that they just change the collection names and that we fix their metadata manually afterwards
  • -
  • Also, they have a lot of messed up values in their cg.subject.wle field so I will clean up some of those first:
  • -
- -
dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id=119) to /tmp/wle.csv with csv;
-COPY 111
-
- -
    -
  • Respond to Atmire message about ORCIDs, saying that right now we’d prefer to just have them available via REST API like any other metadata field, and that I’m available for a Skype
  • -
- -

2017-05-26

- -
    -
  • Increase max file size in nginx so that CIP can upload some larger PDFs
  • -
  • Agree to talk with Atmire after the June DSpace developers meeting where they will be discussing exposing ORCIDs via REST/OAI
  • -
- -

2017-05-28

- -
    -
  • File an issue on GitHub to explore/track migration to proper country/region codes (ISO 23 and UN M.49): #326
  • -
  • Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website
  • -
  • Communicate with MARLO people about progress on exposing ORCIDs via the REST API, as it is set to be discussed in the June, 2017 DCAT meeting
  • -
  • Find all of Amos Omore’s author name variations so I can link them to his authority entry that has an ORCID:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Omore, A%';
-
- -
    -
  • Set the authority for all variations to one containing an ORCID:
  • -
- -
dspace=# update metadatavalue set authority='4428ee88-90ef-4107-b837-3c0ec988520b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Omore, A%';
-UPDATE 187
-
- -
    -
  • Next I need to do Edgar Twine:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Twine, E%';
-
- -
    -
  • But it doesn’t look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via “Edit this Item” and looked up his ORCID and linked it there
  • -
  • Now I should be able to set his name variations to the new authority:
  • -
- -
dspace=# update metadatavalue set authority='f70d0a01-d562-45b8-bca3-9cf7f249bc8b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Twine, E%';
-
- -
    -
  • Run the corrections on CGSpace and then update discovery / authority
  • -
  • I notice that there are a handful of java.lang.OutOfMemoryError: Java heap space errors in the Catalina logs on CGSpace, I should go look into that…
  • -
- -

2017-05-29

- -
    -
  • Discuss WLE themes and subjects with Mia and Macaroni Bros
  • -
  • We decided we need to create metadata fields for Phase I and II themes
  • -
  • I’ve updated the existing GitHub issue for Phase II (#322) and created a new one to track the changes for Phase I themes (#327)
  • -
  • After Macaroni Bros update the WLE website importer we will rename the WLE collections to reflect Phase II
  • -
  • Also, we need to have Mia and Udana look through the existing metadata in cg.subject.wle as it is quite a mess
  • -
- - - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-06/index.html b/public/2017-06/index.html deleted file mode 100644 index 36868e82a..000000000 --- a/public/2017-06/index.html +++ /dev/null @@ -1,361 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - June, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

June, 2017

- -
- - -

2017-06-01

- -
    -
  • After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes
  • -
  • The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes
  • -
  • Then we’ll create a new sub-community for Phase II and create collections for the research themes there
  • -
  • The current “Research Themes” community will be renamed to “WLE Phase I Research Themes”
  • -
  • Tagged all items in the current Phase I collections with their appropriate themes
  • -
  • Create pull request to add Phase II research themes to the submission form: #328
  • -
  • Add cg.subject.system to CGSpace metadata registry, for subject from the upcoming CGIAR Library migration
  • -
- -

2017-06-04

- -
    -
  • After adding cg.identifier.wletheme to 1106 WLE items I can see the field on XMLUI but not in REST!
  • -
  • Strangely it happens on DSpace Test AND on CGSpace!
  • -
  • I tried to re-index Discovery but it didn’t fix it
  • -
  • Run all system updates on DSpace Test and reboot the server
  • -
  • After rebooting the server (and therefore restarting Tomcat) the new metadata field is available
  • -
  • I’ve sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket
  • -
- -

2016-06-05

- -
    -
  • Rename WLE’s “Research Themes” sub-community to “WLE Phase I Research Themes” on DSpace Test so Macaroni Bros can continue their testing
  • -
  • Macaroni Bros tested it and said it’s fine, so I renamed it on CGSpace as well
  • -
  • Working on how to automate the extraction of the CIAT Book chapters, doing some magic in OpenRefine to extract page from–to from cg.identifier.url and dc.format.extent, respectively: - -
      -
    • cg.identifier.url: value.split("page=", "")[1]
    • -
    • dc.format.extent: value.replace("p. ", "").split("-")[1].toNumber() - value.replace("p. ", "").split("-")[0].toNumber()
    • -
  • -
  • Finally, after some filtering to see which small outliers there were (based on dc.format.extent using “p. 1-14” vs “29 p.”), create a new column with last page number: - -
      -
    • cells["dc.page.from"].value.toNumber() + cells["dc.format.pages"].value.toNumber()
    • -
  • -
  • Then create a new, unique file name to be used in the output, based on a SHA1 of the dc.title and with a description: - -
      -
    • dc.page.to: value.split(" ")[0].replace(",","").toLowercase() + "-" + sha1(value).get(1,9) + ".pdf__description:" + cells["dc.type"].value
    • -
  • -
  • Start processing 769 records after filtering the following (there are another 159 records that have some other format, or for example they have their own PDF which I will process later), using a modified generate-thumbnails.py script to read certain fields and then pass to GhostScript: - -
      -
    • cg.identifier.url: value.contains("page=")
    • -
    • dc.format.extent: or(value.contains("p. "),value.contains(" p."))
    • -
    • Command like: $ gs -dNOPAUSE -dBATCH -dFirstPage=14 -dLastPage=27 -sDEVICE=pdfwrite -sOutputFile=beans.pdf -f 12605-1.pdf
    • -
  • -
  • 17 of the items have issues with incorrect page number ranges, and upon closer inspection they do not appear in the referenced PDF
  • -
  • I’ve flagged them and proceeded without them (752 total) on DSpace Test:
  • -
- -
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
-
- -
    -
  • I went and did some basic sanity checks on the remaining items in the CIAT Book Chapters and decided they are mostly fine (except one duplicate and the flagged ones), so I imported them to DSpace Test too (162 items)
  • -
  • Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT
  • -
  • Restart Tomcat on CGSpace so that the cg.identifier.wletheme field is available on REST API for Macaroni Bros
  • -
- -

2017-06-07

- -
    -
  • Testing Atmire’s patch for the CUA Workflow Statistics again
  • -
  • Still doesn’t seem to give results I’d expect, like there are no results for Maria Garruccio, or for the ILRI community!
  • -
  • Then I’ll file an update to the issue on Atmire’s tracker
  • -
  • Created a new branch with just the relevant changes, so I can send it to them
  • -
  • One thing I noticed is that there is a failed database migration related to CUA:
  • -
- -
+----------------+----------------------------+---------------------+---------+
-| Version        | Description                | Installed on        | State   |
-+----------------+----------------------------+---------------------+---------+
-| 1.1            | Initial DSpace 1.1 databas |                     | PreInit |
-| 1.2            | Upgrade to DSpace 1.2 sche |                     | PreInit |
-| 1.3            | Upgrade to DSpace 1.3 sche |                     | PreInit |
-| 1.3.9          | Drop constraint for DSpace |                     | PreInit |
-| 1.4            | Upgrade to DSpace 1.4 sche |                     | PreInit |
-| 1.5            | Upgrade to DSpace 1.5 sche |                     | PreInit |
-| 1.5.9          | Drop constraint for DSpace |                     | PreInit |
-| 1.6            | Upgrade to DSpace 1.6 sche |                     | PreInit |
-| 1.7            | Upgrade to DSpace 1.7 sche |                     | PreInit |
-| 1.8            | Upgrade to DSpace 1.8 sche |                     | PreInit |
-| 3.0            | Upgrade to DSpace 3.x sche |                     | PreInit |
-| 4.0            | Initializing from DSpace 4 | 2015-11-20 12:42:52 | Success |
-| 5.0.2014.08.08 | DS-1945 Helpdesk Request a | 2015-11-20 12:42:53 | Success |
-| 5.0.2014.09.25 | DS 1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
-| 5.0.2014.09.26 | DS-1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
-| 5.0.2015.01.27 | MigrateAtmireExtraMetadata | 2015-11-20 12:43:29 | Success |
-| 5.0.2017.04.28 | CUA eperson metadata migra | 2017-06-07 11:07:28 | OutOrde |
-| 5.5.2015.12.03 | Atmire CUA 4 migration     | 2016-11-27 06:39:05 | OutOrde |
-| 5.5.2015.12.03 | Atmire MQM migration       | 2016-11-27 06:39:06 | OutOrde |
-| 5.6.2016.08.08 | CUA emailreport migration  | 2017-01-29 11:18:56 | OutOrde |
-+----------------+----------------------------+---------------------+---------+
-
- - - -

2017-06-18

- -
    -
  • Redeploy CGSpace with latest changes from 5_x-prod, run system updates, and reboot the server
  • -
  • Continue working on ansible infrastructure changes for CGIAR Library
  • -
- -

2017-06-20

- -
    -
  • Import Abenet and Peter’s changes to the CGIAR Library CRP community
  • -
  • Due to them using Windows and renaming some columns there were formatting, encoding, and duplicate metadata value issues
  • -
  • I had to remove some fields from the CSV and rename some back to, ie, dc.subject[en_US] just so DSpace would detect changes properly
  • -
  • Now it looks much better: https://dspacetest.cgiar.org/handle/10947/2517
  • -
  • Removing the HTML tags and HTML/XML entities using the following GREL: - -
      -
    • replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')
    • -
    • value.unescape("html").unescape("xml")
    • -
  • -
  • Finally import 914 CIAT Book Chapters to CGSpace in two batches:
  • -
- -
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
-$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &> /tmp/ciat-books2.log
-
- -

2017-06-25

- -
    -
  • WLE has said that one of their Phase II research themes is being renamed from Regenerating Degraded Landscapes to Restoring Degraded Landscapes
  • -
  • Pull request with the changes to input-forms.xml: #329
  • -
  • As of now it doesn’t look like there are any items using this research theme so we don’t need to do any updates:
  • -
- -
dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=237 and text_value like 'Regenerating Degraded Landscapes%';
- text_value
-------------
-(0 rows)
-
- -
    -
  • Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form
  • -
  • Perhaps we can add them together in the same question for cg.identifier.wletheme
  • -
- -

2017-06-30

- -
    -
  • CGSpace went down briefly, I see lots of these errors in the dspace logs:
  • -
- -
Java stacktrace: java.util.NoSuchElementException: Timeout waiting for idle object
-
- -
    -
  • After looking at the Tomcat logs, Munin graphs, and PostgreSQL connection stats, it seems there is just a high load
  • -
  • Might be a good time to adjust DSpace’s database connection settings, like I first mentioned in April, 2017 after reading the 2017-04 DCAT comments
  • -
  • I’ve adjusted the following in CGSpace’s config: - -
      -
    • db.maxconnections 30→70 (the default PostgreSQL config allows 100 connections, so DSpace’s default of 30 is quite low)
    • -
    • db.maxwait 5000→10000
    • -
    • db.maxidle 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)
    • -
  • -
  • We will need to adjust this again (as well as the pg_hba.conf settings) when we deploy tsega’s REST API
  • -
  • Whip up a test for Marianne of WLE to be able to show both their Phase I and II research themes in the CGSpace item submission form:
  • -
- -

Test A for displaying the Phase I and II research themes -Test B for displaying the Phase I and II research themes

- - - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-07/index.html b/public/2017-07/index.html deleted file mode 100644 index 90a8d3939..000000000 --- a/public/2017-07/index.html +++ /dev/null @@ -1,396 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - July, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

July, 2017

- -
-

2017-07-01

- -
    -
  • Run system updates and reboot DSpace Test
  • -
- -

2017-07-04

- -
    -
  • Merge changes for WLE Phase II theme rename (#329)
  • -
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • -
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • -
- -

- -
$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::'
-
- -
    -
  • The sed script is from a post on the PostgreSQL mailing list
  • -
  • Abenet says the ILRI board wants to be able to have “lead author” for every item, so I’ve whipped up a WIP test in the 5_x-lead-author branch
  • -
  • It works but is still very rough and we haven’t thought out the whole lifecycle yet
  • -
- -

Testing lead author in submission form

- -
    -
  • I assume that “lead author” would actually be the first question on the item submission form
  • -
  • We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for dc.contributor.author (which makes sense of course, but fuck, all the author problems aren’t bad enough?!)
  • -
  • Also would need to edit XMLUI item displays to incorporate this into authors list
  • -
  • And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of dc.contributor.authors… ugh
  • -
  • What if we modify the item submission form to use type-bind fields to show/hide certain fields depending on the type?
  • -
- -

2017-07-05

- -
    -
  • Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (#330)
  • -
  • Generate list of fields in the current CGSpace cg scheme so we can record them properly in the metadata registry:
  • -
- -
$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::' > cg-types.xml
-
- -
    -
  • CGSpace was unavailable briefly, and I saw this error in the DSpace log file:
  • -
- -
2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
-org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
-
- -
    -
  • Looking at the pg_stat_activity table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense
  • -
  • Tsega restarted Tomcat and it’s working now
  • -
  • Abenet said she was generating a report with Atmire’s CUA module, so it could be due to that?
  • -
  • Looking in the logs I see this random error again that I should report to DSpace:
  • -
- -
2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
-
- -
    -
  • Seems to come from dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java
  • -
- -

2017-07-06

- -
    -
  • Sisay tried to help by making a pull request for the RTB flagships but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested
  • -
  • Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field
  • -
- -

2017-07-13

- -
    -
  • Remove UKaid from the controlled vocabulary for dc.description.sponsorship, as Department for International Development, United Kingdom is the correct form and it is already present (#334)
  • -
- -

2017-07-14

- -
    -
  • Sisay sent me a patch to add “Photo Report” to dc.type so I’ve added it to the 5_x-prod branch
  • -
- -

2017-07-17

- -
    -
  • Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice
  • -
  • It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up
  • -
  • Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to input-forms.xml and the sponsors controlled vocabulary
  • -
- -

2017-07-20

- -
    -
  • Skype chat with Addis team about the status of the CGIAR Library migration
  • -
  • Need to add the CGIAR System Organization subjects to Discovery Facets (test first)
  • -
  • Tentative list of dates for the migration: - -
      -
    • August 4: aim to finish data cleanup and then give Peter a list of authors
    • -
    • August 18: ready to show System Office
    • -
    • September 4: all feedback and decisions (including workflows) from System Office
    • -
    • September 1011: go live?
    • -
  • -
  • Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?
  • -
  • Followup meeting on August 89?
  • -
  • Sent Abenet the 2415 records from CGIAR Library’s Historical Archive (109471) after cleaning up the author authorities and HTML entities in dc.contributor.author and dc.description.abstract using OpenRefine: - -
      -
    • Authors: value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
    • -
    • Abstracts: replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')
    • -
  • -
- -

2017-07-24

- -
    -
  • Move two top-level communities to be sub-communities of ILRI Projects
  • -
- -
$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child="$community"; done
-
- -
    -
  • Discuss CGIAR Library data cleanup with Sisay and Abenet
  • -
- -

2017-07-27

- -
    -
  • Help Sisay with some transforms to add descriptions to the filename column of some CIAT Presentations he’s working on in OpenRefine
  • -
  • Marianne emailed a few days ago to ask why “Integrating Ecosystem Solutions” was not in the list of WLE Phase I Research Themes on the input form
  • -
  • I told her that I only added the themes that I saw in the WLE Phase I Research Themes community
  • -
  • Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn’t understand what she was talking about, as all we did in our previous work was rename the old “Research Themes” subcommunity to “WLE Phase I Research Themes” and add a new subcommunity for “WLE Phase II Research Themes”.
  • -
  • Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database
  • -
- -

2017-07-28

- -
    -
  • Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros
  • -
  • I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our input-forms.xml: https://ccafs.cgiar.org/export/ccafsproject
  • -
- -

2017-07-29

- -
    -
  • Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community
  • -
- -

2017-07-30

- -
    -
  • Start working on CCAFS project tag cleanup
  • -
  • More questions about inconsistencies and spelling mistakes in their tags, so I’ve sent some questions for followup
  • -
- -

2017-07-31

- -
    -
  • Looks like the final list of metadata corrections for CCAFS project tags will be:
  • -
- -
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
-update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
-update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
-delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
-
- -
    -
  • Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list
  • -
  • Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations
  • -
  • Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!
  • -
- -
$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l
-52
-
- -
    -
  • From looking at the dspace.log I see they are all using the same session, which means our Crawler Session Manager Valve is working
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-08/index.html b/public/2017-08/index.html deleted file mode 100644 index f3a82b20b..000000000 --- a/public/2017-08/index.html +++ /dev/null @@ -1,682 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - August, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

August, 2017

- -
-

2017-08-01

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
  • -
  • I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
  • -
  • The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session
  • -
  • This means our Tomcat Crawler Session Valve is working
  • -
  • But many of the bots are browsing dynamic URLs like: - -
      -
    • /handle/10568/3353/discover
    • -
    • /handle/10568/16510/browse
    • -
  • -
  • The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
  • -
  • Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
  • -
  • It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
  • -
  • Also, the bot has to successfully browse the page first so it can receive the HTTP header…
  • -
  • We might actually have to block these requests with HTTP 403 depending on the user agent
  • -
  • Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
  • -
  • This was due to newline characters in the dc.description.abstract column, which caused OpenRefine to choke when exporting the CSV
  • -
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • -
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • -
- -

- -

2017-08-02

- -
    -
  • Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)
  • -
  • I think Atmire’s Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can’t figure it out
  • -
  • I had a look at the moduel configuration and couldn’t figure out a way to do this, so I opened a ticket on the Atmire tracker
  • -
  • Atmire responded about the missing workflow statistics issue a few weeks ago but I didn’t see it for some reason
  • -
  • They said they added a publication and saw the workflow stat for the user, so I should try again and let them know
  • -
- -

2017-08-05

- -
    -
  • Usman from CIFOR emailed to ask about the status of our OAI tests for harvesting their DSpace repository
  • -
  • I told him that the OAI appears to not be harvesting properly after the first sync, and that the control panel shows an “Internal error” for that collection:
  • -
- -

CIFOR OAI harvesting

- -
    -
  • I don’t see anything related in our logs, so I asked him to check for our server’s IP in their logs
  • -
  • Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn’t reset the collection, just the harvester status!)
  • -
- -

2017-08-07

- -
    -
  • Apply Abenet’s corrections for the CGIAR Library’s Consortium subcommunity (697 records)
  • -
  • I had to fix a few small things, like moving the dc.title column away from the beginning of the row, delete blank spaces in the abstract in vim using :g/^$/d, add the dc.subject[en_US] column back, as she had deleted it and DSpace didn’t detect the changes made there (we needed to blank the values instead)
  • -
- -

2017-08-08

- -
    -
  • Apply Abenet’s corrections for the CGIAR Library’s historic archive subcommunity (2415 records)
  • -
  • I had to add the dc.subject[en_US] column back with blank values so that DSpace could detect the changes
  • -
  • I applied the changes in 500 item batches
  • -
- -

2017-08-09

- -
    -
  • Run system updates on DSpace Test and reboot server
  • -
  • Help ICARDA upgrade their MELSpace to DSpace 5.7 using the docker-dspace container - -
      -
    • We had to import the PostgreSQL dump to the PostgreSQL container using: pg_restore -U postgres -d dspace blah.dump
    • -
    • Otherwise, when using -O it messes up the permissions on the schema and DSpace can’t read it
    • -
  • -
- -

2017-08-10

- -
    -
  • Apply last updates to the CGIAR Library’s Fund community (812 items)
  • -
  • Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.
  • -
  • Also I applied the HTML entities unescape transform on the abstract column in Open Refine
  • -
  • I need to get an author list from the database for only the CGIAR Library community to send to Peter
  • -
  • It turns out that I had already used this SQL query in May, 2017 to get the authors from CGIAR Library:
  • -
- -
dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) group by text_value order by count desc) to /tmp/cgiar-library-authors.csv with csv;
-
- -
    -
  • Meeting with Peter and CGSpace team - -
      -
    • Alan to follow up with ICARDA about depositing in CGSpace, we want ICARD and Drylands legacy content but not duplicates
    • -
    • Alan to follow up on dc.rights, where are we?
    • -
    • Alan to follow up with Atmire about a dedicated field for ORCIDs, based on the discussion in the June, 2017 DCAT meeting
    • -
    • Alan to ask about how to query external services like AGROVOC in the DSpace submission form
    • -
  • -
  • Follow up with Atmire on the ticket about ORCID metadata in DSpace
  • -
  • Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates
  • -
- -

2017-08-11

- -
    -
  • CGSpace had load issues and was throwing errors related to PostgreSQL
  • -
  • I told Tsega to reduce the max connections from 70 to 40 because actually each web application gets that limit and so for xmlui, oai, jspui, rest, etc it could be 70 x 4 = 280 connections depending on the load, and the PostgreSQL config itself is only 100!
  • -
  • I learned this on a recent discussion on the DSpace wiki
  • -
  • I need to either look into setting up a database pool through JNDI or increase the PostgreSQL max connections
  • -
  • Also, I need to find out where the load is coming from (rest?) and possibly block bots from accessing dynamic pages like Browse and Discover instead of just sending an X-Robots-Tag HTTP header
  • -
  • I noticed that Google has bitstreams from the rest interface in the search index. I need to ask on the dspace-tech mailing list to see what other people are doing about this, and maybe start issuing an X-Robots-Tag: none there!
  • -
- -

2017-08-12

- -
    -
  • I sent a message to the mailing list about the duplicate content issue with /rest and /bitstream URLs
  • -
  • Looking at the logs for the REST API on /rest, it looks like there is someone hammering doing testing or something on it…
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 5
-    140 66.249.66.91
-    404 66.249.66.90
-   1479 50.116.102.77
-   9794 45.5.184.196
-  85736 70.32.83.92
-
- -
    -
  • The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead
  • -
  • I’ve enabled logging of /oai requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)
  • -
- -
    # log oai requests
-    location /oai {
-        access_log /var/log/nginx/oai.log;
-        proxy_pass http://tomcat_http;
-    }
-
- -

2017-08-13

- -
    -
  • Macaroni Bros say that CCAFS wants them to check once every hour for changes
  • -
  • I told them to check every four or six hours
  • -
- -

2017-08-14

- -
    -
  • Run author corrections on CGIAR Library community from Peter
  • -
- -
$ ./fix-metadata-values.py -i /tmp/authors-fix-523.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p fuuuu
-
- -
    -
  • There were only three deletions so I just did them manually:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='C';
-DELETE 1
-dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD';
-
- -
    -
  • Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done
  • -
  • Thinking about resource limits for PostgreSQL again after last week’s CGSpace crash and related to a recently discussion I had in the comments of the April, 2017 DCAT meeting notes
  • -
  • In that thread Chris Wilper suggests a new default of 35 max connections for db.maxconnections (from the current default of 30), knowing that each DSpace web application gets to use up to this many on its own
  • -
  • It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:
  • -
- -
$ grep -rsI SQLException dspace-jspui | wc -l          
-473
-$ grep -rsI SQLException dspace-oai | wc -l  
-63
-$ grep -rsI SQLException dspace-rest | wc -l
-139
-$ grep -rsI SQLException dspace-solr | wc -l                                                                               
-0
-$ grep -rsI SQLException dspace-xmlui | wc -l
-866
-
- -
    -
  • Of those five applications we’re running, only solr appears not to use the database directly
  • -
  • And JSPUI is only used internally (so it doesn’t really count), leaving us with OAI, REST, and XMLUI
  • -
  • Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL’s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see superuser_reserved_connections)
  • -
  • So we should adjust PostgreSQL’s max connections to be DSpace’s db.maxconnections * 3 + 3
  • -
  • This would allow each application to use up to db.maxconnections and not to go over the system’s PostgreSQL limit
  • -
  • Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for db.maxconnections
  • -
  • Also worth looking into is to set up a database pool using JNDI, as apparently DSpace’s db.poolname hasn’t been used since around DSpace 1.7 (according to Chris Wilper’s comments in the thread)
  • -
  • Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate
  • -
  • Looks like connections hover around 50:
  • -
- -

PostgreSQL connections 2017-08

- -
    -
  • Unfortunately I don’t have the breakdown of which DSpace apps are making those connections (I’ll assume XMLUI)
  • -
  • So I guess a limit of 30 (DSpace default) is too low, but 70 causes problems when the load increases and the system’s PostgreSQL max_connections is too low
  • -
  • For now I think maybe setting DSpace’s db.maxconnections to 40 and adjusting the system’s max_connections might be a good starting point: 40 * 3 + 3 = 123
  • -
  • Apply 223 more author corrections from Peter on CGIAR Library
  • -
  • Help Magdalena from CCAFS with some CUA statistics questions
  • -
- -

2017-08-15

- -
    -
  • Increase the nginx upload limit on CGSpace (linode18) so Sisay can upload 23 CIAT reports
  • -
  • Do some last minute cleanups and de-duplications of the CGIAR Library data, as I need to send it to Peter this week
  • -
  • Metadata fields like dc.contributor.author, dc.publisher, dc.type, and a few others had somehow been duplicated along the line
  • -
  • Also, a few dozen dc.description.abstract fields still had various HTML tags and entities in them
  • -
  • Also, a bunch of dc.subject fields that were not AGROVOC had not been moved properly to cg.system.subject
  • -
- -

2017-08-16

- -
    -
  • I wanted to merge the various field variations like cg.subject.system and cg.subject.system[en_US] in OpenRefine but I realized it would be easier in PostgreSQL:
  • -
- -
dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=254;
-
- -
    -
  • And actually, we can do it for other generic fields for items in those collections, for example dc.description.abstract:
  • -
- -
dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')))
-
- -
    -
  • And on others like dc.language.iso, dc.relation.ispartofseries, dc.type, dc.title, etc…
  • -
  • Also, to move fields from dc.identifier.url to cg.identifier.url[en_US] (because we don’t use the Dublin Core one for some reason):
  • -
- -
dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237;
-UPDATE 15
-
- -
    -
  • Set the text_lang of all dc.identifier.uri (Handle) fields to be NULL, just like default DSpace does:
  • -
- -
dspace=# update metadatavalue set text_lang=NULL where resource_type_id = 2 and metadata_field_id = 25 and text_value like 'http://hdl.handle.net/10947/%';
-UPDATE 4248
-
- -
    -
  • Also update the text_lang of dc.contributor.author fields for metadata in these collections:
  • -
- -
dspace=# update metadatavalue set text_lang=NULL where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')));
-UPDATE 4899
-
- -
    -
  • Wow, I just wrote this baller regex facet to find duplicate authors:
  • -
- -
isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
-
- -
    -
  • This would be true if the authors were like CGIAR System Management Office||CGIAR System Management Office, which some of the CGIAR Library’s were
  • -
  • Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn’t detect any changes, so you have to edit them all manually via DSpace’s “Edit Item”
  • -
  • Ooh! And an even more interesting regex would match any duplicated author:
  • -
- -
isNotNull(value.match(/(.+?)\|\|\1/))
-
- -
    -
  • Which means it can also be used to find items with duplicate dc.subject fields…
  • -
  • Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it
  • -
  • Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form
  • -
  • Abenet was asking if there was some way to hide certain internal items from the “ILRI Research Outputs” RSS feed (which is the top-level ILRI community feed), because Shirley was complaining
  • -
  • I think we could use harvest.includerestricted.rss = false but the items might need to be 100% restricted, not just the metadata
  • -
  • Adjust Ansible postgres role to use max_connections from a template variable and deploy a new limit of 123 on CGSpace
  • -
- -

2017-08-17

- -
    -
  • Run Peter’s edits to the CGIAR System Organization community on DSpace Test
  • -
  • Uptime Robot said CGSpace went down for 1 minute, not sure why
  • -
  • Looking in dspace.log.2017-08-17 I see some weird errors that might be related?
  • -
- -
2017-08-17 07:55:31,396 ERROR net.sf.ehcache.store.DiskStore @ cocoon-ehcacheCache: Could not read disk store element for key PK_G-aspect-cocoon://DRI/12/handle/10568/65885?pipelinehash=823411183535858997_T-Navigation-3368194896954203241. Error was invalid stream header: 00000000
-java.io.StreamCorruptedException: invalid stream header: 00000000
-
- -
    -
  • Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:
  • -
- -
# grep -c "ERROR net.sf.ehcache.store.DiskStore" dspace.log.2017-08-*
-dspace.log.2017-08-01:0
-dspace.log.2017-08-02:0
-dspace.log.2017-08-03:0
-dspace.log.2017-08-04:0
-dspace.log.2017-08-05:0
-dspace.log.2017-08-06:0
-dspace.log.2017-08-07:0
-dspace.log.2017-08-08:0
-dspace.log.2017-08-09:0
-dspace.log.2017-08-10:0
-dspace.log.2017-08-11:8806
-dspace.log.2017-08-12:5496
-dspace.log.2017-08-13:2925
-dspace.log.2017-08-14:2135
-dspace.log.2017-08-15:1506
-dspace.log.2017-08-16:1935
-dspace.log.2017-08-17:584
-
- -
    -
  • There are none in 2017-07 either…
  • -
  • A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow
  • -
  • I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)
  • -
  • We tested the option for limiting restricted items from the RSS feeds on DSpace Test
  • -
  • I created four items, and only the two with public metadata showed up in the community’s RSS feed: - -
      -
    • Public metadata, public bitstream ✓
    • -
    • Public metadata, restricted bitstream ✓
    • -
    • Restricted metadata, restricted bitstream ✗
    • -
    • Private item ✗
    • -
  • -
  • Peter responded and said that he doesn’t want to limit items to be restricted just so we can change the RSS feeds
  • -
- -

2017-08-18

- -
    -
  • Someone on the dspace-tech mailing list responded with some tips about using the authority framework to do external queries from the submission form
  • -
  • He linked to some examples from DSpace-CRIS that use this functionality: VIAFAuthority
  • -
  • I wired it up to the dc.subject field of the submission interface using the “lookup” type and it works!
  • -
  • I think we can use this example to get a working AGROVOC query
  • -
  • More information about authority framework: https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values
  • -
  • Wow, I’m playing with the AGROVOC SPARQL endpoint using the sparql-query tool:
  • -
- -
$ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
-sparql$ PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
-SELECT 
-    ?label 
-WHERE {  
-   {  ?concept  skos:altLabel ?label . } UNION {  ?concept  skos:prefLabel ?label . }
-   FILTER regex(str(?label), "^fish", "i") .
-} LIMIT 10
-
-┌───────────────────────┐                                                      
-│ ?label                │                                                      
-├───────────────────────┤                                                      
-│ fisheries legislation │                                                      
-│ fishery legislation   │                                                      
-│ fishery law           │                                                      
-│ fish production       │                                                      
-│ fish farming          │                                                      
-│ fishing industry      │                                                      
-│ fisheries data        │                                                      
-│ fishing power         │                                                      
-│ fishing times         │                                                      
-│ fish passes           │                                                      
-└───────────────────────┘
-
- - - -

2017-08-19

- - - -

2017-08-20

- -
    -
  • Since I cleared the XMLUI cache on 2017-08-17 there haven’t been any more ERROR net.sf.ehcache.store.DiskStore errors
  • -
  • Look at the CGIAR Library to see if I can find the items that have been submitted since May:
  • -
- -
dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z';
- metadata_value_id | item_id | metadata_field_id |      text_value      | text_lang | place | authority | confidence 
--------------------+---------+-------------------+----------------------+-----------+-------+-----------+------------
-            123117 |    5872 |                11 | 2017-06-28T13:05:18Z |           |     1 |           |         -1
-            123042 |    5869 |                11 | 2017-05-15T03:29:23Z |           |     1 |           |         -1
-            123056 |    5870 |                11 | 2017-05-22T11:27:15Z |           |     1 |           |         -1
-            123072 |    5871 |                11 | 2017-06-06T07:46:01Z |           |     1 |           |         -1
-            123171 |    5874 |                11 | 2017-08-04T07:51:20Z |           |     1 |           |         -1
-(5 rows)
-
- -
    -
  • According to dc.date.accessioned (metadata field id 11) there have only been five items submitted since May
  • -
  • These are their handles:
  • -
- -
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
-   handle   
-------------
- 10947/4658
- 10947/4659
- 10947/4660
- 10947/4661
- 10947/4664
-(5 rows)
-
- -

2017-08-23

- -
    -
  • Start testing the nginx configs for the CGIAR Library migration as well as start making a checklist
  • -
- -

2017-08-28

- -
    -
  • Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account
  • -
  • I told him I can chat in a few weeks when I’m back
  • -
- -

2017-08-31

- -
    -
  • I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.
  • -
  • I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had performance issues with Solr because of this
  • -
  • I asked Sisay about this and hinted that he should go back and fix these things, but let’s see what he says
  • -
  • Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:
  • -
- -
ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error
-org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
-
- -
    -
  • Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08
  • -
  • It seems that I changed the db.maxconnections setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then
  • -
  • Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system’s PostgreSQL max_connections)
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-09/index.html b/public/2017-09/index.html deleted file mode 100644 index 2055bd926..000000000 --- a/public/2017-09/index.html +++ /dev/null @@ -1,854 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - September, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

September, 2017

- -
-

2017-09-06

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • -
- -

2017-09-07

- -
    -
  • Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
  • -
- -

- -

2017-09-10

- -
    -
  • Delete 58 blank metadata values from the CGSpace database:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
-DELETE 58
-
- -
    -
  • I also ran it on DSpace Test because we’ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate
  • -
  • Run system updates and restart DSpace Test
  • -
  • We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)
  • -
  • I still have the original data from the CGIAR Library so I’ve zipped it up and sent it off to linode18 for now
  • -
  • sha256sum of original-cgiar-library-6.6GB.tar.gz is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a
  • -
  • Start doing a test run of the CGIAR Library migration locally
  • -
  • Notes and todo checklist here for now: https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c
  • -
  • Create pull request for Phase I and II changes to CCAFS Project Tags: #336
  • -
  • We’ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized
  • -
  • There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in 2017-07, but I’ve asked for more clarification from Lili just in case
  • -
  • Looking at the DSpace logs to see if we’ve had a change in the “Cannot get a connection” errors since last month when we adjusted the db.maxconnections parameter on CGSpace:
  • -
- -
# grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-09-*
-dspace.log.2017-09-01:0
-dspace.log.2017-09-02:0
-dspace.log.2017-09-03:9
-dspace.log.2017-09-04:17
-dspace.log.2017-09-05:752
-dspace.log.2017-09-06:0
-dspace.log.2017-09-07:0
-dspace.log.2017-09-08:10
-dspace.log.2017-09-09:0
-dspace.log.2017-09-10:0
-
- -
    -
  • Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I’m sure that helped
  • -
  • There are still some errors, though, so maybe I should bump the connection limit up a bit
  • -
  • I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we’re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system’s PostgreSQL max_connections (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)
  • -
  • I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)
  • -
  • I’m expecting to see 0 connection errors for the next few months
  • -
- -

2017-09-11

- - - -

2017-09-12

- -
    -
  • I was testing the METS XSD caching during AIP ingest but it doesn’t seem to help actually
  • -
  • The import process takes the same amount of time with and without the caching
  • -
  • Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):
  • -
- -
$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
-
- -
    -
  • Great TCP dump guide here: https://danielmiessler.com/study/tcpdump
  • -
  • The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation
  • -
  • I sent a message to the mailing list to see if anyone knows more about this
  • -
  • In looking at the tcpdump results I notice that there is an update check to the ehcache server on every iteration of the ingest loop, for example:
  • -
- -
09:39:36.008956 IP 192.168.8.124.50515 > 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433&os-name=Mac+OS+X&jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&jvm-version=1.8.0_144&platform=x86_64&tc-version=UNKNOWN&tc-product=Ehcache+Core+1.7.2&source=Ehcache+Core&uptime-secs=0&patch=UNKNOWN HTTP/1.1
-
- -
    -
  • Turns out this is a known issue and Ehcache has refused to make it opt-in: https://jira.terracotta.org/jira/browse/EHC-461
  • -
  • But we can disable it by adding an updateCheck="false" attribute to the main <ehcache > tag in dspace-services/src/main/resources/caching/ehcache-config.xml
  • -
  • After re-compiling and re-deploying DSpace I no longer see those update checks during item submission
  • -
  • I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace - -
      -
    • First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name
    • -
    • The logic is that searching by name actually isn’t very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names
    • -
    • Atmire’s proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)
    • -
    • Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field
    • -
    • Ideally there could also be a user interface for cleanup and merging of authorities
    • -
    • He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release
    • -
    • As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us
    • -
  • -
- -

2017-09-13

- -
    -
  • Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours
  • -
  • I wonder what was going on, and looking into the nginx logs I think maybe it’s OAI…
  • -
  • Here is yesterday’s top ten IP addresses making requests to /oai:
  • -
- -
# awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
-      1 213.136.89.78
-      1 66.249.66.90
-      1 66.249.66.92
-      3 68.180.229.31
-      4 35.187.22.255
-  13745 54.70.175.86
-  15814 34.211.17.113
-  15825 35.161.215.53
-  16704 54.70.51.7
-
- -
    -
  • Compared to the previous day’s logs it looks VERY high:
  • -
- -
# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
-      1 207.46.13.39
-      1 66.249.66.93
-      2 66.249.66.91
-      4 216.244.66.194
-     14 66.249.66.90
-
- -
    -
  • The user agents for those top IPs are: - -
      -
    • 54.70.175.86: API scraper
    • -
    • 34.211.17.113: API scraper
    • -
    • 35.161.215.53: API scraper
    • -
    • 54.70.51.7: API scraper
    • -
  • -
  • And this user agent has never been seen before today (or at least recently!):
  • -
- -
# grep -c "API scraper" /var/log/nginx/oai.log
-62088
-# zgrep -c "API scraper" /var/log/nginx/oai.log.*.gz
-/var/log/nginx/oai.log.10.gz:0
-/var/log/nginx/oai.log.11.gz:0
-/var/log/nginx/oai.log.12.gz:0
-/var/log/nginx/oai.log.13.gz:0
-/var/log/nginx/oai.log.14.gz:0
-/var/log/nginx/oai.log.15.gz:0
-/var/log/nginx/oai.log.16.gz:0
-/var/log/nginx/oai.log.17.gz:0
-/var/log/nginx/oai.log.18.gz:0
-/var/log/nginx/oai.log.19.gz:0
-/var/log/nginx/oai.log.20.gz:0
-/var/log/nginx/oai.log.21.gz:0
-/var/log/nginx/oai.log.22.gz:0
-/var/log/nginx/oai.log.23.gz:0
-/var/log/nginx/oai.log.24.gz:0
-/var/log/nginx/oai.log.25.gz:0
-/var/log/nginx/oai.log.26.gz:0
-/var/log/nginx/oai.log.27.gz:0
-/var/log/nginx/oai.log.28.gz:0
-/var/log/nginx/oai.log.29.gz:0
-/var/log/nginx/oai.log.2.gz:0
-/var/log/nginx/oai.log.30.gz:0
-/var/log/nginx/oai.log.3.gz:0
-/var/log/nginx/oai.log.4.gz:0
-/var/log/nginx/oai.log.5.gz:0
-/var/log/nginx/oai.log.6.gz:0
-/var/log/nginx/oai.log.7.gz:0
-/var/log/nginx/oai.log.8.gz:0
-/var/log/nginx/oai.log.9.gz:0
-
- -
    -
  • Some of these heavy users are also using XMLUI, and their user agent isn’t matched by the Tomcat Session Crawler valve, so each request uses a different session
  • -
  • Yesterday alone the IP addresses using the API scraper user agent were responsible for 16,000 sessions in XMLUI:
  • -
- -
# grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-15924
-
- -
    -
  • If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex
  • -
  • A search for “API scraper” user agent on Google returns a robots.txt with a comment that this is the Yewno bot: http://www.escholarship.org/robots.txt
  • -
  • Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:
  • -
- -
WARN  org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
-
- -
    -
  • Looking at the spreadsheet with deletions and corrections that CCAFS sent last week
  • -
  • It appears they want to delete a lot of metadata, which I’m not sure they realize the implications of:
  • -
- -
dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;                                                                                                                                                                                                                  
-        text_value        | count                              
---------------------------+-------                             
- FP4_ClimateModels        |     6                              
- FP1_CSAEvidence          |     7                              
- SEA_UpscalingInnovation  |     7                              
- FP4_Baseline             |    69                              
- WA_Partnership           |     1                              
- WA_SciencePolicyExchange |     6                              
- SA_GHGMeasurement        |     2                              
- SA_CSV                   |     7                              
- EA_PAR                   |    18                              
- FP4_Livestock            |     7                              
- FP4_GenderPolicy         |     4                              
- FP2_CRMWestAfrica        |    12                              
- FP4_ClimateData          |    24                              
- FP4_CCPAG                |     2                              
- SEA_mitigationSAMPLES    |     2                              
- SA_Biodiversity          |     1                              
- FP4_PolicyEngagement     |    20                              
- FP3_Gender               |     9                              
- FP4_GenderToolbox        |     3                              
-(19 rows)
-
- -
    -
  • I sent CCAFS people an email to ask if they really want to remove these 200+ tags
  • -
  • She responded yes, so I’ll at least need to do these deletes in PostgreSQL:
  • -
- -
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
-DELETE 207
-
- -
    -
  • When we discussed this in late July there were some other renames they had requested, but I don’t see them in the current spreadsheet so I will have to follow that up
  • -
  • I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!
  • -
  • The final list of corrections and deletes should therefore be:
  • -
- -
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
-update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
-update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
-delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
-delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
-
- -
    -
  • Create and merge pull request to shut up the Ehcache update check (#337)
  • -
  • Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): https://jira.duraspace.org/browse/DS-1492
  • -
  • I commented there suggesting that we disable it globally
  • -
  • I merged the changes to the CCAFS project tags (#336) but still need to finalize the metadata deletions/renames
  • -
  • I merged the CGIAR Library theme changes (#338) to the 5_x-prod branch in preparation for next week’s migration
  • -
  • I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver
  • -
  • They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new sitebndl.zip
  • -
  • Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database
  • -
  • Here are all my distinct authority combinations in the database before:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
- text_value |              authority               | confidence 
-------------+--------------------------------------+------------
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
- Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
- Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
- Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-(8 rows)
-
- -
    -
  • And then after adding a new item and selecting an existing “Orth, Alan” with an ORCID in the author lookup:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
- text_value |              authority               | confidence 
-------------+--------------------------------------+------------
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
- Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
- Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
- Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
- Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-(9 rows)
-
- -
    -
  • It created a new authority… let’s try to add another item and select the same existing author and see what happens in the database:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
- text_value |              authority               | confidence 
-------------+--------------------------------------+------------
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
- Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
- Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
- Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
- Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-(9 rows)
-
- -
    -
  • No new one… so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
- text_value |              authority               | confidence 
-------------+--------------------------------------+------------
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f |        600
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
- Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
- Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
- Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
- Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-(10 rows)
-
- -
    -
  • Shit, it created another authority! Let’s try it again!
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';                                                                                             
- text_value |              authority               | confidence
-------------+--------------------------------------+------------
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f |        600
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
- Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db |        600
- Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
- Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
- Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
- Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
- Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
- Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-(11 rows)
-
- -
    -
  • It added another authority… surely this is not the desired behavior, or maybe we are not using this as intented?
  • -
- -

2017-09-14

- -
    -
  • Communicate with Handle.net admins to try to get some guidance about the 10947 prefix
  • -
  • Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new sitebndl.zip file from their server and send it to Handle.net
  • -
  • Also, Handle.net says their prefix is up for annual renewal next month so we might want to just pay for it and take it over
  • -
  • CGSpace was very slow and Uptime Robot even said it was down at one time
  • -
  • I didn’t see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it’s just normal growing pains
  • -
  • Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M
  • -
- -

2017-09-15

- -
    -
  • Apply CCAFS project tag corrections on CGSpace:
  • -
- -
dspace=# \i /tmp/ccafs-projects.sql 
-DELETE 5
-UPDATE 4
-UPDATE 1
-DELETE 1
-DELETE 207
-
- -

2017-09-17

- -
    -
  • Create pull request for CGSpace to be able to resolve multiple handles (#339)
  • -
  • We still need to do the changes to config.dct and regenerate the sitebndl.zip to send to the Handle.net admins
  • -
  • According to this dspace-tech mailing list entry from 2011, we need to add the extra handle prefixes to config.dct like this:
  • -
- -
"server_admins" = (
-"300:0.NA/10568"
-"300:0.NA/10947"
-)
-
-"replication_admins" = (
-"300:0.NA/10568"
-"300:0.NA/10947"
-)
-
-"backup_admins" = (
-"300:0.NA/10568"
-"300:0.NA/10947"
-)
-
- -
    -
  • More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community
  • -
  • The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568
  • -
  • I ended up having to read the AIP Backup and Restore closely a few times and then explicitly preserve handles and ignore parents:
  • -
- -
$ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/87738 $item; done
-
- -
    -
  • Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!
  • -
  • I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing Timeout waiting for idle object errors
  • -
  • I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL max_connections as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import
  • -
- -

2017-09-18

- -
    -
  • I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better
  • -
  • One item for comparison:
  • -
- -

With original DSpace 1.7 thumbnail

- -

After DSpace 5.5

- -
    -
  • Moved the CGIAR Library Migration notes to a page — cgiar-library-migration — as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in config.toml (happens currently in Hugo 0.27.1 at least)
  • -
- -

2017-09-19

- -
    -
  • Nightly Solr indexing is working again, and it appears to be pretty quick actually:
  • -
- -
2017-09-19 00:00:14,953 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
-...
-2017-09-19 00:04:18,017 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
-
- - - -

2017-09-20

- -
    -
  • Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite
  • -
  • Force thumbnail regeneration for the CGIAR System Organization’s Historic Archive community (2000 items):
  • -
- -
$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p "ImageMagick PDF Thumbnail"
-
- -
    -
  • I’m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org
  • -
- -

2017-09-21

- -
    -
  • Switch to OpenJDK 8 from Oracle JDK on DSpace Test
  • -
  • I want to test this for awhile to see if we can start using it instead
  • -
  • I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions
  • -
- -

2017-09-22

- - - -

2017-09-24

- -
    -
  • Start investigating other platforms for CGSpace due to linear instance pricing on Linode
  • -
  • We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs
  • -
  • First, here’s the last week of memory usage on CGSpace and DSpace Test:
  • -
- -

CGSpace memory week -DSpace Test memory week

- -
    -
  • 8GB of RAM seems to be good for DSpace Test for now, with Tomcat’s JVM heap taking 3GB, caches and buffers taking 3–4GB, and then ~1GB unused
  • -
  • 24GB of RAM is way too much for CGSpace, with Tomcat’s JVM heap taking 5.5GB and caches and buffers happily using 14GB or so
  • -
  • As far as disk space, the CGSpace assetstore currently uses 51GB and Solr cores use 86GB (mostly in the statistics core)
  • -
  • DSpace Test currently doesn’t even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space
  • -
  • I’ve heard Google Cloud is nice (cheap and performant) but it’s definitely more complicated than Linode and instances aren’t that much cheaper to make it worth it
  • -
  • Here are some theoretical instances on Google Cloud: - -
      -
    • DSpace Test, n1-standard-2 with 2 vCPUs, 7.5GB RAM, 300GB persistent SSD: $99/month
    • -
    • CGSpace, n1-standard-4 with 4 vCPUs, 15GB RAM, 300GB persistent SSD: $148/month
    • -
  • -
  • Looking at Linode’s instance pricing, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add block storage of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)
  • -
  • For CGSpace we could use the cheaper 12GB instance for $80 and then add block storage of 500GB for $50
  • -
  • I’ve sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta
  • -
  • Create pull request for adding ISI Journal to search filters (#341)
  • -
  • Peter asked if we could map all the items of type Journal Article in ILRI Archive to ILRI articles in journals and newsletters
  • -
  • It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not
  • -
  • I’ve asked Peter if he knows what’s going on (or who mapped them)
  • -
  • Turns out he had already mapped some, but requested that I finish the rest
  • -
  • With this GREL in OpenRefine I can find items that are mapped, ie they have 10568/3|| or 10568/3$ in their collection field:
  • -
- -
isNotNull(value.match(/.+?10568\/3(\|\|.+|$)/))
-
- -
    -
  • Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections
  • -
  • I ended up having to kill the import and wait until he was done
  • -
  • I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)
  • -
- -

2017-09-25

- -
    -
  • Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode
  • -
  • Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org
  • -
  • Peter wants me to clean up the text values for Delia Grace’s metadata, as the authorities are all messed up again since we cleaned them up in 2016-12:
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';                                  
-  text_value  |              authority               | confidence              
---------------+--------------------------------------+------------             
- Grace, Delia |                                      |        600              
- Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c |        600              
- Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c |         -1              
- Grace, D.    | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc |         -1
-
- -
    -
  • Strangely, none of her authority entries have ORCIDs anymore…
  • -
  • I’ll just fix the text values and forget about it for now:
  • -
- -
dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-UPDATE 610
-
- -
    -
  • After this we have to reindex the Discovery and Authority cores (as tomcat7 user):
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
-
-real    83m56.895s
-user    13m16.320s
-sys     2m17.917s
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-authority -b
-Retrieving all data
-Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
-Exception: null
-java.lang.NullPointerException
-        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
-        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-
-real    6m6.447s
-user    1m34.010s
-sys     0m12.113s
-
- -
    -
  • The index-authority script always seems to fail, I think it’s the same old bug
  • -
  • Something interesting for my notes about JNDI database pool—since I couldn’t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:
  • -
- -
ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal
-...
-INFO  org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal
-INFO  org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
-
- -
    -
  • So it’s good to know that something gets printed when it fails because I didn’t see any mention of JNDI before when I was testing!
  • -
- -

2017-09-26

- -
    -
  • Adam Hunt from WLE finally registered so I added him to the editor and approver groups
  • -
  • Then I noticed that Sisay never removed Marianne’s user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps
  • -
  • For what it’s worth, I had asked him to remove them on 2017-09-14
  • -
  • I also went and added the WLE approvers and editors groups to the appropriate steps of all the Phase I and Phase II research theme collections
  • -
  • A lot of CIAT’s items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border
  • -
  • I communicated with Elizabeth from CIAT to tell her she should use DSpace’s automatically generated thumbnails
  • -
  • Start discussiong with ICT about Linode server update for DSpace Test
  • -
  • Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records
  • -
- -

2017-09-28

- -
    -
  • Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET
  • -
  • Now the redirects work
  • -
  • I quickly registered a Let’s Encrypt certificate for the domain:
  • -
- -
# systemctl stop nginx
-# /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org
-# systemctl start nginx
-
- -
    -
  • I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:
  • -
- -
$ openssl s_client -connect cgspace.cgiar.org:443 -servername library.cgiar.org  -tls1_2 -tlsextdebug -status
-...
-OCSP Response Data:
-...
-Cert Status: good
-
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-10/index.html b/public/2017-10/index.html deleted file mode 100644 index 26b7ae684..000000000 --- a/public/2017-10/index.html +++ /dev/null @@ -1,619 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - October, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

October, 2017

- -
-

2017-10-01

- - - -
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-
- -
    -
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • -
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • -
- -

- -

2017-10-02

- -
    -
  • Peter Ballantyne said he was having problems logging into CGSpace with “both” of his accounts (CGIAR LDAP and personal, apparently)
  • -
  • I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a “no DN found” error:
  • -
- -
2017-10-01 20:24:57,928 WARN  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)]
-2017-10-01 20:22:37,982 INFO  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne
-
- -
    -
  • I thought maybe his account had expired (seeing as it’s was the first of the month) but he says he was finally able to log in today
  • -
  • The logs for yesterday show fourteen errors related to LDAP auth failures:
  • -
- -
$ grep -c "ldap_authentication:type=failed_auth" dspace.log.2017-10-01
-14
-
- -
    -
  • For what it’s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET’s LDAP server
  • -
  • Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks
  • -
- -

2017-10-04

- -
    -
  • Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)
  • -
  • Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace
  • -
  • The first is a link to a browse page that should be handled better in nginx:
  • -
- -
http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject
-
- -
    -
  • We’ll need to check for browse links and handle them properly, including swapping the subject parameter for systemsubject (which doesn’t exist in Discovery yet, but we’ll need to add it) as we have moved their poorly curated subjects from dc.subject to cg.subject.system
  • -
  • The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead
  • -
  • Help Sisay proof sixty-two IITA records on DSpace Test
  • -
  • Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries
  • -
  • Merge the Discovery search changes for ISI Journal (#341)
  • -
- -

2017-10-05

- -
    -
  • Twice in the past twenty-four hours Linode has warned that CGSpace’s outbound traffic rate was exceeding the notification threshold
  • -
  • I had a look at yesterday’s OAI and REST logs in /var/log/nginx but didn’t see anything unusual:
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
-    141 157.55.39.240
-    145 40.77.167.85
-    162 66.249.66.92
-    181 66.249.66.95
-    211 66.249.66.91
-    312 66.249.66.94
-    384 66.249.66.90
-   1495 50.116.102.77
-   3904 70.32.83.92
-   9904 45.5.184.196
-# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
-      5 66.249.66.71
-      6 66.249.66.67
-      6 68.180.229.31
-      8 41.84.227.85
-      8 66.249.66.92
-     17 66.249.66.65
-     24 66.249.66.91
-     38 66.249.66.95
-     69 66.249.66.90
-    148 66.249.66.94
-
- -
    -
  • Working on the nginx redirects for CGIAR Library
  • -
  • We should start using 301 redirects and also allow for /sitemap to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way
  • -
  • Remove eleven occurrences of ACP in IITA’s cg.coverage.region using the Atmire batch edit module from Discovery
  • -
  • Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods
  • -
  • Run corrections on 143 ILRI Archive items that had two dc.identifier.uri values (Handle) that Peter had pointed out earlier this week
  • -
  • I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace
  • -
  • I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one
  • -
- -

2017-10-06

- - - -

Original flat thumbnails -Tweaked with border and box shadow

- -
    -
  • I’ll post it to the Yammer group to see what people think
  • -
  • I figured out at way to do the HTML verification for Google Search console for library.cgiar.org
  • -
  • We can drop the HTML file in their XMLUI theme folder and it will get copied to the webapps directory during build/install
  • -
  • Then we add an nginx alias for that URL in the library.cgiar.org vhost
  • -
  • This method is kinda a hack but at least we can put all the pieces into git to be reproducible
  • -
  • I will tell Tunji to send me the verification file
  • -
- -

2017-10-10

- -
    -
  • Deploy logic to allow verification of the library.cgiar.org domain in the Google Search Console (#343)
  • -
  • After verifying both the HTTP and HTTPS domains and submitting a sitemap it will be interesting to see how the stats in the console as well as the search results change (currently 28,500 results):
  • -
- -

Google Search Console -Google Search Console 2 -Google Search results

- -
    -
  • I tried to submit a “Change of Address” request in the Google Search Console but I need to be an owner on CGSpace’s console (currently I’m just a user) in order to do that
  • -
  • Manually clean up some communities and collections that Peter had requested a few weeks ago
  • -
  • Delete Community 10568102 (ILRI Research and Development Issues)
  • -
  • Move five collections to 1056827629 (ILRI Projects) using move-collections.sh with the following configuration:
  • -
- -
10568/1637 10568/174 10568/27629
-10568/1642 10568/174 10568/27629
-10568/1614 10568/174 10568/27629
-10568/75561 10568/150 10568/27629
-10568/183 10568/230 10568/27629
-
- -
    -
  • Delete community 10568174 (Sustainable livestock futures)
  • -
  • Delete collections in 1056827629 that have zero items (33 of them!)
  • -
- -

2017-10-11

- -
    -
  • Peter added me as an owner on the CGSpace property on Google Search Console and I tried to submit a “Change of Address” request for the CGIAR Library but got an error:
  • -
- -

Change of Address error

- -
    -
  • We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won’t work—we’ll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects
  • -
  • Also the Google Search Console doesn’t work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the “Change of Address” tool to work!
  • -
- -

2017-10-12

- -
    -
  • Finally finish (I think) working on the myriad nginx redirects for all the CGIAR Library browse stuff—it ended up getting pretty complicated!
  • -
  • I still need to commit the DSpace changes (add browse index, XMLUI strings, Discovery index, etc), but I should be able to deploy that on CGSpace soon
  • -
- -

2017-10-14

- -
    -
  • Run system updates on DSpace Test and reboot server
  • -
  • Merge changes adding a search/browse index for CGIAR System subject to 5_x-prod (#344)
  • -
  • I checked the top browse links in Google’s search results for site:library.cgiar.org inurl:browse and they are all redirected appropriately by the nginx rewrites I worked on last week
  • -
- -

2017-10-22

- -
    -
  • Run system updates on DSpace Test and reboot server
  • -
  • Re-deploy CGSpace from latest 5_x-prod (adds ISI Journal to search filters and adds Discovery index for CGIAR Library systemsubject)
  • -
  • Deploy nginx redirect fixes to catch CGIAR Library browse links (redirect to their community and translate subject→systemsubject)
  • -
  • Run migration of CGSpace server (linode18) for Linode security alert, which took 42 minutes of downtime
  • -
- -

2017-10-26

- -
    -
  • In the last 24 hours we’ve gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace
  • -
  • Uptime Robot even noticed CGSpace go “down” for a few minutes
  • -
  • In other news, I was trying to look at a question about stats raised by Magdalena and then CGSpace went down due to SQL connection pool
  • -
  • Looking at the PostgreSQL activity I see there are 93 connections, but after a minute or two they went down and CGSpace came back up
  • -
  • Annnd I reloaded the Atmire Usage Stats module and the connections shot back up and CGSpace went down again
  • -
  • Still not sure where the load is coming from right now, but it’s clear why there were so many alerts yesterday on the 25th!
  • -
- -
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-25 | sort -n | uniq | wc -l
-18022
-
- -
    -
  • Compared to other days there were two or three times the number of requests yesterday!
  • -
- -
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-23 | sort -n | uniq | wc -l
-3141
-# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-26 | sort -n | uniq | wc -l
-7851
-
- -
    -
  • I still have no idea what was causing the load to go up today
  • -
  • I finally investigated Magdalena’s issue with the item download stats and now I can’t reproduce it: I get the same number of downloads reported in the stats widget on the item page, the “Most Popular Items” page, and in Usage Stats
  • -
  • I think it might have been an issue with the statistics not being fresh
  • -
  • I added the admin group for the systems organization to the admin role of the top-level community of CGSpace because I guess Sisay had forgotten
  • -
  • Magdalena asked if there was a way to reuse data in item submissions where items have a lot of similar data
  • -
  • I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection
  • -
  • We’ve never used it but it could be worth looking at
  • -
- -

2017-10-27

- -
    -
  • Linode alerted about high CPU usage again (twice) on CGSpace in the last 24 hours, around 2AM and 2PM
  • -
- -

2017-10-28

- -
    -
  • Linode alerted about high CPU usage again on CGSpace around 2AM this morning
  • -
- -

2017-10-29

- -
    -
  • Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM
  • -
  • I’m still not sure why this started causing alerts so repeatadely the past week
  • -
  • I don’t see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:
  • -
- -
# grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-2049
-
- -
    -
  • So there were 2049 unique sessions during the hour of 2AM
  • -
  • Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts
  • -
  • I think I’ll need to enable access logging in nginx to figure out what’s going on
  • -
  • After enabling logging on requests to XMLUI on / I see some new bot I’ve never seen before:
  • -
- -
137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] "GET /discover?filtertype_0=type&filter_relational_operator_0=equals&filter_0=Internal+Document&filtertype=author&filter_relational_operator=equals&filter=CGIAR+Secretariat HTTP/1.1" 200 7776 "-" "Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)"
-
- -
    -
  • CORE seems to be some bot that is “Aggregating the world’s open access research papers”
  • -
  • The contact address listed in their bot’s user agent is incorrect, correct page is simply: https://core.ac.uk/contact
  • -
  • I will check the logs in a few days to see if they are harvesting us regularly, then add their bot’s user agent to the Tomcat Crawler Session Valve
  • -
  • After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now
  • -
  • For now I will just contact them to have them update their contact info in the bot’s user agent, but eventually I think I’ll tell them to swap out the CGIAR Library entry for CGSpace
  • -
- -

2017-10-30

- -
    -
  • Like clock work, Linode alerted about high CPU usage on CGSpace again this morning (this time at 8:13 AM)
  • -
  • Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:
  • -
- -
dspace=# SELECT * FROM pg_stat_activity;
-...
-(93 rows)
-
- -
    -
  • Surprise surprise, the CORE bot is likely responsible for the recent load issues, making hundreds of thousands of requests yesterday and today:
  • -
- -
# grep -c "CORE/0.6" /var/log/nginx/access.log 
-26475
-# grep -c "CORE/0.6" /var/log/nginx/access.log.1
-135083
-
- -
    -
  • IP addresses for this bot currently seem to be:
  • -
- -
# grep "CORE/0.6" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq
-137.108.70.6
-137.108.70.7
-
- -
    -
  • I will add their user agent to the Tomcat Session Crawler Valve but it won’t help much because they are only using two sessions:
  • -
- -
# grep 137.108.70 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq
-session_id=5771742CABA3D0780860B8DA81E0551B
-session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
-
- -
    -
  • … and most of their requests are for dynamic discover pages:
  • -
- -
# grep -c 137.108.70 /var/log/nginx/access.log
-26622
-# grep 137.108.70 /var/log/nginx/access.log | grep -c "GET /discover"
-24055
-
- -
    -
  • Just because I’m curious who the top IPs are:
  • -
- -
# awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail
-    496 62.210.247.93
-    571 46.4.94.226
-    651 40.77.167.39
-    763 157.55.39.231
-    782 207.46.13.90
-    998 66.249.66.90
-   1948 104.196.152.243
-   4247 190.19.92.5
-  31602 137.108.70.6
-  31636 137.108.70.7
-
- -
    -
  • At least we know the top two are CORE, but who are the others?
  • -
  • 190.19.92.5 is apparently in Argentina, and 104.196.152.243 is from Google Cloud Engine
  • -
  • Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don’t reuse their session variable, creating thousands of new sessions!
  • -
- -
# grep 190.19.92.5 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-1419
-# grep 104.196.152.243 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-2811
-
- -
    -
  • From looking at the requests, it appears these are from CIAT and CCAFS
  • -
  • I wonder if I could somehow instruct them to use a user agent so that we could apply a crawler session manager valve to them
  • -
  • Actually, according to the Tomcat docs, we could use an IP with crawlerIps: https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve
  • -
  • Ah, wait, it looks like crawlerIps only came in 2017-06, so probably isn’t in Ubuntu 16.04’s 7.0.68 build!
  • -
  • That would explain the errors I was getting when trying to set it:
  • -
- -
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property 'crawlerIps' to '190\.19\.92\.5|104\.196\.152\.243' did not find a matching property.
-
- -
    -
  • As for now, it actually seems the CORE bot coming from 137.108.70.6 and 137.108.70.7 is only using a few sessions per day, which is good:
  • -
- -
# grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=137.108.70.(6|7)' dspace.log.2017-10-30 | sort -n | uniq -c | sort -h
-    410 session_id=74F0C3A133DBF1132E7EC30A7E7E0D60:ip_addr=137.108.70.7
-    574 session_id=5771742CABA3D0780860B8DA81E0551B:ip_addr=137.108.70.7
-   1012 session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A:ip_addr=137.108.70.6
-
- -
    -
  • I will check again tomorrow
  • -
- -

2017-10-31

- -
    -
  • Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again
  • -
  • Ask on the dspace-tech mailing list if it’s possible to use an existing item as a template for a new item
  • -
  • To follow up on the CORE bot traffic, there were almost 300,000 request yesterday:
  • -
- -
# grep "CORE/0.6" /var/log/nginx/access.log.1 | awk '{print $1}' | sort -n | uniq -c | sort -h
- 139109 137.108.70.6
- 139253 137.108.70.7
-
- -
    -
  • I’ve emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace
  • -
  • Also, I asked if they could perhaps use the sitemap.xml, OAI-PMH, or REST APIs to index us more efficiently, because they mostly seem to be crawling the nearly endless Discovery facets
  • -
  • I added GoAccess to the list of package to install in the DSpace role of the Ansible infrastructure scripts
  • -
  • It makes it very easy to analyze nginx logs from the command line, to see where traffic is coming from:
  • -
- -
# goaccess /var/log/nginx/access.log --log-format=COMBINED
-
- -
    -
  • According to Uptime Robot CGSpace went down and up a few times
  • -
  • I had a look at goaccess and I saw that CORE was actively indexing
  • -
  • Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)
  • -
  • I’m really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable
  • -
  • Actually, come to think of it, they aren’t even obeying robots.txt, because we actually disallow /discover and /search-filter URLs but they are hitting those massively:
  • -
- -
# grep "CORE/0.6" /var/log/nginx/access.log | grep -o -E "GET /(discover|search-filter)" | sort -n | uniq -c | sort -rn 
- 158058 GET /discover
-  14260 GET /search-filter
-
- -
    -
  • I tested a URL of pattern /discover in Google’s webmaster tools and it was indeed identified as blocked
  • -
  • I will send feedback to the CORE bot team
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-11/index.html b/public/2017-11/index.html deleted file mode 100644 index 59526d45d..000000000 --- a/public/2017-11/index.html +++ /dev/null @@ -1,1263 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - November, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

November, 2017

- -
-

2017-11-01

- -
    -
  • The CORE developers responded to say they are looking into their bot not respecting our robots.txt
  • -
- -

2017-11-02

- -
    -
  • Today there have been no hits by CORE and no alerts from Linode (coincidence?)
  • -
- -
# grep -c "CORE" /var/log/nginx/access.log
-0
-
- -
    -
  • Generate list of authors on CGSpace for Peter to go through and correct:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-
- -

- -
    -
  • Abenet asked if it would be possible to generate a report of items in Listing and Reports that had “International Fund for Agricultural Development” as the only investor
  • -
  • I opened a ticket with Atmire to ask if this was possible: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=540
  • -
  • Work on making the thumbnails in the item view clickable
  • -
  • Basically, once you read the METS XML for an item it becomes easy to trace the structure to find the bitstream link
  • -
- -
//mets:fileSec/mets:fileGrp[@USE='CONTENT']/mets:file/mets:FLocat[@LOCTYPE='URL']/@xlink:href
-
- -
    -
  • METS XML is available for all items with this pattern: /metadata/handle/10568/95947/mets.xml
  • -
  • I whipped up a quick hack to print a clickable link with this URL on the thumbnail but it needs to check a few corner cases, like when there is a thumbnail but no content bitstream!
  • -
  • Help proof fifty-three CIAT records for Sisay: https://dspacetest.cgiar.org/handle/10568/95895
  • -
  • A handful of issues with cg.place using format like “Lima, PE” instead of “Lima, Peru”
  • -
  • Also, some dates like with completely invalid format like “2010- 06” and “2011-3-28”
  • -
  • I also collapsed some consecutive whitespace on a handful of fields
  • -
- -

2017-11-03

- -
    -
  • Atmire got back to us to say that they estimate it will take two days of labor to implement the change to Listings and Reports
  • -
  • I said I’d ask Abenet if she wants that feature
  • -
- -

2017-11-04

- -
    -
  • I finished looking through Sisay’s CIAT records for the “Alianzas de Aprendizaje” data
  • -
  • I corrected about half of the authors to standardize them
  • -
  • Linode emailed this morning to say that the CPU usage was high again, this time at 6:14AM
  • -
  • It’s the first time in a few days that this has happened
  • -
  • I had a look to see what was going on, but it isn’t the CORE bot:
  • -
- -
# awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail
-    306 68.180.229.31
-    323 61.148.244.116
-    414 66.249.66.91
-    507 40.77.167.16
-    618 157.55.39.161
-    652 207.46.13.103
-    666 157.55.39.254
-   1173 104.196.152.243
-   1737 66.249.66.90
-  23101 138.201.52.218
-
- -
    -
  • 138.201.52.218 is from some Hetzner server, and I see it making 40,000 requests yesterday too, but none before that:
  • -
- -
# zgrep -c 138.201.52.218 /var/log/nginx/access.log*
-/var/log/nginx/access.log:24403
-/var/log/nginx/access.log.1:45958
-/var/log/nginx/access.log.2.gz:0
-/var/log/nginx/access.log.3.gz:0
-/var/log/nginx/access.log.4.gz:0
-/var/log/nginx/access.log.5.gz:0
-/var/log/nginx/access.log.6.gz:0
-
- -
    -
  • It’s clearly a bot as it’s making tens of thousands of requests, but it’s using a “normal” user agent:
  • -
- -
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
-
- -
    -
  • For now I don’t know what this user is!
  • -
- -

2017-11-05

- -
    -
  • Peter asked if I could fix the appearance of “International Livestock Research Institute” in the author lookup during item submission
  • -
  • It looks to be just an issue with the user interface expecting authors to have both a first and last name:
  • -
- -

Author lookup -Add author

- -
    -
  • But in the database the authors are correct (none with weird , / characters):
  • -
- -
dspace=# select distinct text_value, authority, confidence from metadatavalue value where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Livestock Research Institute%';
-                 text_value                 |              authority               | confidence 
---------------------------------------------+--------------------------------------+------------
- International Livestock Research Institute | 8f3865dc-d056-4aec-90b7-77f49ab4735c |          0
- International Livestock Research Institute | f4db1627-47cd-4699-b394-bab7eba6dadc |          0
- International Livestock Research Institute |                                      |         -1
- International Livestock Research Institute | 8f3865dc-d056-4aec-90b7-77f49ab4735c |        600
- International Livestock Research Institute | f4db1627-47cd-4699-b394-bab7eba6dadc |         -1
- International Livestock Research Institute |                                      |        600
- International Livestock Research Institute | 8f3865dc-d056-4aec-90b7-77f49ab4735c |         -1
- International Livestock Research Institute | 8f3865dc-d056-4aec-90b7-77f49ab4735c |        500
-(8 rows)
-
- -
    -
  • So I’m not sure if this is just a graphical glitch or if editors have to edit this metadata field prior to approval
  • -
  • Looking at monitoring Tomcat’s JVM heap with Prometheus, it looks like we need to use JMX + jmx_exporter
  • -
  • This guide shows how to enable JMX in Tomcat by modifying CATALINA_OPTS
  • -
  • I was able to successfully connect to my local Tomcat with jconsole!
  • -
- -

2017-11-07

- -
    -
  • CGSpace when down and up a few times this morning, first around 3AM, then around 7
  • -
  • Tsega had to restart Tomcat 7 to fix it temporarily
  • -
  • I will start by looking at bot usage (access.log.1 includes usage until 6AM today):
  • -
- -
# cat /var/log/nginx/access.log.1 | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    619 65.49.68.184
-    840 65.49.68.199
-    924 66.249.66.91
-   1131 68.180.229.254
-   1583 66.249.66.90
-   1953 207.46.13.103
-   1999 207.46.13.80
-   2021 157.55.39.161
-   2034 207.46.13.36
-   4681 104.196.152.243
-
- -
    -
  • 104.196.152.243 seems to be a top scraper for a few weeks now:
  • -
- -
# zgrep -c 104.196.152.243 /var/log/nginx/access.log*
-/var/log/nginx/access.log:336
-/var/log/nginx/access.log.1:4681
-/var/log/nginx/access.log.2.gz:3531
-/var/log/nginx/access.log.3.gz:3532
-/var/log/nginx/access.log.4.gz:5786
-/var/log/nginx/access.log.5.gz:8542
-/var/log/nginx/access.log.6.gz:6988
-/var/log/nginx/access.log.7.gz:7517
-/var/log/nginx/access.log.8.gz:7211
-/var/log/nginx/access.log.9.gz:2763
-
- -
    -
  • This user is responsible for hundreds and sometimes thousands of Tomcat sessions:
  • -
- -
$ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-954
-$ grep 104.196.152.243 dspace.log.2017-11-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-6199
-$ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-7051
-
- -
    -
  • The worst thing is that this user never specifies a user agent string so we can’t lump it in with the other bots using the Tomcat Session Crawler Manager Valve
  • -
  • They don’t request dynamic URLs like “/discover” but they seem to be fetching handles from XMLUI instead of REST (and some with //handle, note the regex below):
  • -
- -
# grep -c 104.196.152.243 /var/log/nginx/access.log.1
-4681
-# grep 104.196.152.243 /var/log/nginx/access.log.1 | grep -c -P 'GET //?handle'
-4618
-
- -
    -
  • I just realized that ciat.cgiar.org points to 104.196.152.243, so I should contact Leroy from CIAT to see if we can change their scraping behavior
  • -
  • The next IP (207.46.13.36) seem to be Microsoft’s bingbot, but all its requests specify the “bingbot” user agent and there are no requests for dynamic URLs that are forbidden, like “/discover”:
  • -
- -
$ grep -c 207.46.13.36 /var/log/nginx/access.log.1 
-2034
-# grep 207.46.13.36 /var/log/nginx/access.log.1 | grep -c "GET /discover"
-0
-
- -
    -
  • The next IP (157.55.39.161) also seems to be bingbot, and none of its requests are for URLs forbidden by robots.txt either:
  • -
- -
# grep 157.55.39.161 /var/log/nginx/access.log.1 | grep -c "GET /discover"
-0
-
- -
    -
  • The next few seem to be bingbot as well, and they declare a proper user agent and do not request dynamic URLs like “/discover”:
  • -
- -
# grep -c -E '207.46.13.[0-9]{2,3}' /var/log/nginx/access.log.1 
-5997
-# grep -E '207.46.13.[0-9]{2,3}' /var/log/nginx/access.log.1 | grep -c "bingbot"
-5988
-# grep -E '207.46.13.[0-9]{2,3}' /var/log/nginx/access.log.1 | grep -c "GET /discover"
-0
-
- -
    -
  • The next few seem to be Googlebot, and they declare a proper user agent and do not request dynamic URLs like “/discover”:
  • -
- -
# grep -c -E '66.249.66.[0-9]{2,3}' /var/log/nginx/access.log.1 
-3048
-# grep -E '66.249.66.[0-9]{2,3}' /var/log/nginx/access.log.1 | grep -c Google
-3048
-# grep -E '66.249.66.[0-9]{2,3}' /var/log/nginx/access.log.1 | grep -c "GET /discover"
-0
-
- -
    -
  • The next seems to be Yahoo, which declares a proper user agent and does not request dynamic URLs like “/discover”:
  • -
- -
# grep -c 68.180.229.254 /var/log/nginx/access.log.1 
-1131
-# grep  68.180.229.254 /var/log/nginx/access.log.1 | grep -c "GET /discover"
-0
-
- -
    -
  • The last of the top ten IPs seems to be some bot with a weird user agent, but they are not behaving too well:
  • -
- -
# grep -c -E '65.49.68.[0-9]{3}' /var/log/nginx/access.log.1 
-2950
-# grep -E '65.49.68.[0-9]{3}' /var/log/nginx/access.log.1 | grep -c "GET /discover"
-330
-
- -
    -
  • Their user agents vary, ie: - -
      -
    • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
    • -
    • Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
    • -
    • Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
    • -
  • -
  • I’ll just keep an eye on that one for now, as it only made a few hundred requests to dynamic discovery URLs
  • -
  • While it’s not in the top ten, Baidu is one bot that seems to not give a fuck:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "7/Nov/2017" | grep -c Baiduspider
-8912
-# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "7/Nov/2017" | grep Baiduspider | grep -c -E "GET /(browse|discover|search-filter)"
-2521
-
- -
    -
  • According to their documentation their bot respects robots.txt, but I don’t see this being the case
  • -
  • I think I will end up blocking Baidu as well…
  • -
  • Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed
  • -
  • I should look in nginx access.log, rest.log, oai.log, and DSpace’s dspace.log.2017-11-07
  • -
  • Here are the top IPs making requests to XMLUI from 2 to 8 AM:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    279 66.249.66.91
-    373 65.49.68.199
-    446 68.180.229.254
-    470 104.196.152.243
-    470 197.210.168.174
-    598 207.46.13.103
-    603 157.55.39.161
-    637 207.46.13.80
-    703 207.46.13.36
-    724 66.249.66.90
-
- -
    -
  • Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot
  • -
  • Here are the top IPs making requests to REST from 2 to 8 AM:
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-      8 207.241.229.237
-     10 66.249.66.90
-     16 104.196.152.243
-     25 41.60.238.61
-     26 157.55.39.161
-     27 207.46.13.103
-     27 207.46.13.80
-     31 207.46.13.36
-   1498 50.116.102.77
-
- -
    -
  • The OAI requests during that same time period are nothing to worry about:
  • -
- -
# cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-      1 66.249.66.92
-      4 66.249.66.90
-      6 68.180.229.254
-
- -
    -
  • The top IPs from dspace.log during the 2–8 AM period:
  • -
- -
$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail
-    143 ip_addr=213.55.99.121
-    181 ip_addr=66.249.66.91
-    223 ip_addr=157.55.39.161
-    248 ip_addr=207.46.13.80
-    251 ip_addr=207.46.13.103
-    291 ip_addr=207.46.13.36
-    297 ip_addr=197.210.168.174
-    312 ip_addr=65.49.68.199
-    462 ip_addr=104.196.152.243
-    488 ip_addr=66.249.66.90
-
- -
    -
  • These aren’t actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers
  • -
  • The number of requests isn’t even that high to be honest
  • -
  • As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone:
  • -
- -
# zgrep -c 124.17.34.59 /var/log/nginx/access.log*
-/var/log/nginx/access.log:22581
-/var/log/nginx/access.log.1:0
-/var/log/nginx/access.log.2.gz:14
-/var/log/nginx/access.log.3.gz:0
-/var/log/nginx/access.log.4.gz:0
-/var/log/nginx/access.log.5.gz:3
-/var/log/nginx/access.log.6.gz:0
-/var/log/nginx/access.log.7.gz:0
-/var/log/nginx/access.log.8.gz:0
-/var/log/nginx/access.log.9.gz:1
-
- -
    -
  • The whois data shows the IP is from China, but the user agent doesn’t really give any clues:
  • -
- -
# grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
-    210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
-  22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
-
- -
    -
  • A Google search for “LCTE bot” doesn’t return anything interesting, but this Stack Overflow discussion references the lack of information
  • -
  • So basically after a few hours of looking at the log files I am not closer to understanding what is going on!
  • -
  • I do know that we want to block Baidu, though, as it does not respect robots.txt
  • -
  • And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours)
  • -
  • At least for now it seems to be that new Chinese IP (124.17.34.59):
  • -
- -
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    198 207.46.13.103
-    203 207.46.13.80
-    205 207.46.13.36
-    218 157.55.39.161
-    249 45.5.184.221
-    258 45.5.187.130
-    386 66.249.66.90
-    410 197.210.168.174
-   1896 104.196.152.243
-  11005 124.17.34.59
-
- -
    -
  • Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time!
  • -
- -
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf
-5948
-# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
-0
-
- -
    -
  • About CIAT, I think I need to encourage them to specify a user agent string for their requests, because they are not reuising their Tomcat session and they are creating thousands of sessions per day
  • -
  • All CIAT requests vs unique ones:
  • -
- -
$ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-11-07 | wc -l
-3506
-$ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-11-07 | sort | uniq | wc -l
-3506
-
- -
    -
  • I emailed CIAT about the session issue, user agent issue, and told them they should not scrape the HTML contents of communities, instead using the REST API
  • -
  • About Baidu, I found a link to their robots.txt tester tool
  • -
  • It seems like our robots.txt file is valid, and they claim to recognize that URLs like /discover should be forbidden (不允许, aka “not allowed”):
  • -
- -

Baidu robots.txt tester

- -
    -
  • But they literally just made this request today:
  • -
- -
180.76.15.136 - - [07/Nov/2017:06:25:11 +0000] "GET /discover?filtertype_0=crpsubject&filter_relational_operator_0=equals&filter_0=WATER%2C+LAND+AND+ECOSYSTEMS&filtertype=subject&filter_relational_operator=equals&filter=WATER+RESOURCES HTTP/1.1" 200 82265 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
-
- -
    -
  • Along with another thousand or so requests to URLs that are forbidden in robots.txt today alone:
  • -
- -
# grep -c Baiduspider /var/log/nginx/access.log
-3806
-# grep Baiduspider /var/log/nginx/access.log | grep -c -E "GET /(browse|discover|search-filter)"
-1085
-
- -
    -
  • I will think about blocking their IPs but they have 164 of them!
  • -
- -
# grep "Baiduspider/2.0" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq | wc -l
-164
-
- -

2017-11-08

- -
    -
  • Linode sent several alerts last night about CPU usage and outbound traffic rate at 6:13PM
  • -
  • Linode sent another alert about CPU usage in the morning at 6:12AM
  • -
  • Jesus, the new Chinese IP (124.17.34.59) has downloaded 24,000 PDFs in the last 24 hours:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "0[78]/Nov/2017:" | grep 124.17.34.59 | grep -v pdf.jpg | grep -c pdf
-24981
-
- -
    -
  • This is about 20,000 Tomcat sessions:
  • -
- -
$ cat dspace.log.2017-11-07 dspace.log.2017-11-08 | grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=124.17.34.59' | sort | uniq | wc -l
-20733
-
- -
    -
  • I’m getting really sick of this
  • -
  • Sisay re-uploaded the CIAT records that I had already corrected earlier this week, erasing all my corrections
  • -
  • I had to re-correct all the publishers, places, names, dates, etc and apply the changes on DSpace Test
  • -
  • Run system updates on DSpace Test and reboot the server
  • -
  • Magdalena had written to say that two of their Phase II project tags were missing on CGSpace, so I added them (#346)
  • -
  • I figured out a way to use nginx’s map function to assign a “bot” user agent to misbehaving clients who don’t define a user agent
  • -
  • Most bots are automatically lumped into one generic session by Tomcat’s Crawler Session Manager Valve but this only works if their user agent matches a pre-defined regular expression like .*[bB]ot.*
  • -
  • Some clients send thousands of requests without a user agent which ends up creating thousands of Tomcat sessions, wasting precious memory, CPU, and database resources in the process
  • -
  • Basically, we modify the nginx config to add a mapping with a modified user agent $ua:
  • -
- -
map $remote_addr $ua {
-    # 2017-11-08 Random Chinese host grabbing 20,000 PDFs
-    124.17.34.59     'ChineseBot';
-    default          $http_user_agent;
-}
-
- -
    -
  • If the client’s address matches then the user agent is set, otherwise the default $http_user_agent variable is used
  • -
  • Then, in the server’s / block we pass this header to Tomcat:
  • -
- -
proxy_pass http://tomcat_http;
-proxy_set_header User-Agent $ua;
-
- -
    -
  • Note to self: the $ua variable won’t show up in nginx access logs because the default combined log format doesn’t show it, so don’t run around pulling your hair out wondering with the modified user agents aren’t showing in the logs!
  • -
  • If a client matching one of these IPs connects without a session, it will be assigned one by the Crawler Session Manager Valve
  • -
  • You can verify by cross referencing nginx’s access.log and DSpace’s dspace.log.2017-11-08, for example
  • -
  • I will deploy this on CGSpace later this week
  • -
  • I am interested to check how this affects the number of sessions used by the CIAT and Chinese bots (see above on 2017-11-07 for example)
  • -
  • I merged the clickable thumbnails code to 5_x-prod (#347) and will deploy it later along with the new bot mapping stuff (and re-run the Asible nginx and tomcat tags)
  • -
  • I was thinking about Baidu again and decided to see how many requests they have versus Google to URL paths that are explicitly forbidden in robots.txt:
  • -
- -
# zgrep Baiduspider /var/log/nginx/access.log* | grep -c -E "GET /(browse|discover|search-filter)"
-22229
-# zgrep Googlebot /var/log/nginx/access.log* | grep -c -E "GET /(browse|discover|search-filter)"
-0
-
- -
    -
  • It seems that they rarely even bother checking robots.txt, but Google does multiple times per day!
  • -
- -
# zgrep Baiduspider /var/log/nginx/access.log* | grep -c robots.txt
-14
-# zgrep Googlebot  /var/log/nginx/access.log* | grep -c robots.txt
-1134
-
- -
    -
  • I have been looking for a reason to ban Baidu and this is definitely a good one
  • -
  • Disallowing Baiduspider in robots.txt probably won’t work because this bot doesn’t seem to respect the robot exclusion standard anyways!
  • -
  • I will whip up something in nginx later
  • -
  • Run system updates on CGSpace and reboot the server
  • -
  • Re-deploy latest 5_x-prod branch on CGSpace and DSpace Test (includes the clickable thumbnails, CCAFS phase II project tags, and updated news text)
  • -
- -

2017-11-09

- -
    -
  • Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions:
  • -
- -
# zcat -f -- /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep '09/Nov/2017' | grep -c 104.196.152.243
-8956
-$ grep 104.196.152.243 dspace.log.2017-11-09 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-223
-
- -
    -
  • Versus the same stats for yesterday and the day before:
  • -
- -
# zcat -f -- /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep '08/Nov/2017' | grep -c 104.196.152.243 
-10216
-$ grep 104.196.152.243 dspace.log.2017-11-08 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-2592
-# zcat -f -- /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep '07/Nov/2017' | grep -c 104.196.152.243
-8120
-$ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-3506
-
- -
    -
  • The number of sessions is over ten times less!
  • -
  • This gets me thinking, I wonder if I can use something like nginx’s rate limiter to automatically change the user agent of clients who make too many requests
  • -
  • Perhaps using a combination of geo and map, like illustrated here: https://www.nginx.com/blog/rate-limiting-nginx/
  • -
- -

2017-11-11

- -
    -
  • I was looking at the Google index and noticed there are 4,090 search results for dspace.ilri.org but only seven for mahider.ilri.org
  • -
  • Search with something like: inurl:dspace.ilri.org inurl:https
  • -
  • I want to get rid of those legacy domains eventually!
  • -
- -

2017-11-12

- -
    -
  • Update the Ansible infrastructure templates to be a little more modular and flexible
  • -
  • Looking at the top client IPs on CGSpace so far this morning, even though it’s only been eight hours:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "12/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    243 5.83.120.111
-    335 40.77.167.103
-    424 66.249.66.91
-    529 207.46.13.36
-    554 40.77.167.129
-    604 207.46.13.53
-    754 104.196.152.243
-    883 66.249.66.90
-   1150 95.108.181.88
-   1381 5.9.6.51
-
- -
    -
  • 5.9.6.51 seems to be a Russian bot:
  • -
- -
# grep 5.9.6.51 /var/log/nginx/access.log | tail -n 1
-5.9.6.51 - - [12/Nov/2017:08:13:13 +0000] "GET /handle/10568/16515/recent-submissions HTTP/1.1" 200 5097 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"
-
- -
    -
  • What’s amazing is that it seems to reuse its Java session across all requests:
  • -
- -
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
-1558
-$ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-1
-
- -
    -
  • Bravo to MegaIndex.ru!
  • -
  • The same cannot be said for 95.108.181.88, which appears to be YandexBot, even though Tomcat’s Crawler Session Manager valve regex should match ‘YandexBot’:
  • -
- -
# grep 95.108.181.88 /var/log/nginx/access.log | tail -n 1
-95.108.181.88 - - [12/Nov/2017:08:33:17 +0000] "GET /bitstream/handle/10568/57004/GenebankColombia_23Feb2015.pdf HTTP/1.1" 200 972019 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
-991
-
- -
    -
  • Move some items and collections on CGSpace for Peter Ballantyne, running move_collections.sh with the following configuration:
  • -
- -
10947/6    10947/1 10568/83389
-10947/34   10947/1 10568/83389
-10947/2512 10947/1 10568/83389
-
- - - -
$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
-HTTP/1.1 200 OK
-Connection: keep-alive
-Content-Encoding: gzip
-Content-Language: en-US
-Content-Type: text/html;charset=utf-8
-Date: Sun, 12 Nov 2017 16:30:19 GMT
-Server: nginx
-Strict-Transport-Security: max-age=15768000
-Transfer-Encoding: chunked
-Vary: Accept-Encoding
-X-Cocoon-Version: 2.2.0
-X-Content-Type-Options: nosniff
-X-Frame-Options: SAMEORIGIN
-X-XSS-Protection: 1; mode=block
-$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
-HTTP/1.1 503 Service Temporarily Unavailable
-Connection: keep-alive
-Content-Length: 206
-Content-Type: text/html
-Date: Sun, 12 Nov 2017 16:30:21 GMT
-Server: nginx
-
- -
    -
  • The first request works, second is denied with an HTTP 503!
  • -
  • I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them
  • -
- -

2017-11-13

- -
    -
  • At the end of the day I checked the logs and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:
  • -
- -
# zcat -f -- /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 200 "
-1132
-# zcat -f -- /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 503 "
-10105
-
- -
    -
  • Helping Sisay proof 47 records for IITA: https://dspacetest.cgiar.org/handle/10568/97029
  • -
  • From looking at the data in OpenRefine I found: - -
      -
    • Errors in cg.authorship.types
    • -
    • Errors in cg.coverage.country (smart quote in “COTE D’IVOIRE”, “HAWAII” is not a country)
    • -
    • Whitespace issues in some cg.contributor.affiliation
    • -
    • Whitespace issues in some cg.identifier.doi fields and most values are using HTTP instead of HTTPS
    • -
    • Whitespace issues in some dc.contributor.author fields
    • -
    • Issue with invalid dc.date.issued value “2011-3”
    • -
    • Description fields are poorly copy–pasted
    • -
    • Whitespace issues in dc.description.sponsorship
    • -
    • Lots of inconsistency in dc.format.extent (mixed dash style, periods at the end of values)
    • -
    • Whitespace errors in dc.identifier.citation
    • -
    • Whitespace errors in dc.subject
    • -
    • Whitespace errors in dc.title
    • -
  • -
  • After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a “.” in it), affiliations, sponsors, etc.
  • -
  • Atmire responded to the ticket about ORCID stuff a few days ago, today I told them that I need to talk to Peter and the partners to see what we would like to do
  • -
- -

2017-11-14

- -
    -
  • Deploy some nginx configuration updates to CGSpace
  • -
  • They had been waiting on a branch for a few months and I think I just forgot about them
  • -
  • I have been running them on DSpace Test for a few days and haven’t seen any issues there
  • -
  • Started testing DSpace 6.2 and a few things have changed
  • -
  • Now PostgreSQL needs pgcrypto:
  • -
- -
$ psql dspace6
-dspace6=# CREATE EXTENSION pgcrypto;
-
- -
    -
  • Also, local settings are no longer in build.properties, they are now in local.cfg
  • -
  • I’m not sure if we can use separate profiles like we did before with mvn -Denv=blah to use blah.properties
  • -
  • It seems we need to use “system properties” to override settings, ie: -Ddspace.dir=/Users/aorth/dspace6
  • -
- -

2017-11-15

- -
    -
  • Send Adam Hunt an invite to the DSpace Developers network on Yammer
  • -
  • He is the new head of communications at WLE, since Michael left
  • -
  • Merge changes to item view’s wording of link metadata (#348)
  • -
- -

2017-11-17

- -
    -
  • Uptime Robot said that CGSpace went down today and I see lots of Timeout waiting for idle object errors in the DSpace logs
  • -
  • I looked in PostgreSQL using SELECT * FROM pg_stat_activity; and saw that there were 73 active connections
  • -
  • After a few minutes the connecitons went down to 44 and CGSpace was kinda back up, it seems like Tsega restarted Tomcat
  • -
  • Looking at the REST and XMLUI log files, I don’t see anything too crazy:
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep "17/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-     13 66.249.66.223
-     14 207.46.13.36
-     17 207.46.13.137
-     22 207.46.13.23
-     23 66.249.66.221
-     92 66.249.66.219
-    187 104.196.152.243
-   1400 70.32.83.92
-   1503 50.116.102.77
-   6037 45.5.184.196
-# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "17/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    325 139.162.247.24
-    354 66.249.66.223
-    422 207.46.13.36
-    434 207.46.13.23
-    501 207.46.13.137
-    647 66.249.66.221
-    662 34.192.116.178
-    762 213.55.99.121
-   1867 104.196.152.243
-   2020 66.249.66.219
-
- -
    -
  • I need to look into using JMX to analyze active sessions I think, rather than looking at log files
  • -
  • After adding appropriate JMX listener options to Tomcat’s JAVA_OPTS and restarting Tomcat, I can connect remotely using an SSH dynamic port forward (SOCKS) on port 7777 for example, and then start jconsole locally like:
  • -
- -
$ jconsole -J-DsocksProxyHost=localhost -J-DsocksProxyPort=7777 service:jmx:rmi:///jndi/rmi://localhost:9000/jmxrmi -J-DsocksNonProxyHosts=
-
- -
    -
  • Looking at the MBeans you can drill down in Catalina→Manager→webapp→localhost→Attributes and see active sessions, etc
  • -
  • I want to enable JMX listener on CGSpace but I need to do some more testing on DSpace Test and see if it causes any performance impact, for example
  • -
  • If I hit the server with some requests as a normal user I see the session counter increase, but if I specify a bot user agent then the sessions seem to be reused (meaning the Crawler Session Manager is working)
  • -
  • Here is the Jconsole screen after looping http --print Hh https://dspacetest.cgiar.org/handle/10568/1 for a few minutes:
  • -
- -

Jconsole sessions for XMLUI

- -
    -
  • Switch DSpace Test to using the G1GC for JVM so I can see what the JVM graph looks like eventually, and start evaluating it for production
  • -
- -

2017-11-19

- -
    -
  • Linode sent an alert that CGSpace was using a lot of CPU around 4–6 AM
  • -
  • Looking in the nginx access logs I see the most active XMLUI users between 4 and 6 AM:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "19/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    111 66.249.66.155
-    171 5.9.6.51
-    188 54.162.241.40
-    229 207.46.13.23
-    233 207.46.13.137
-    247 40.77.167.6
-    251 207.46.13.36
-    275 68.180.229.254
-    325 104.196.152.243
-   1610 66.249.66.153
-
- -
    -
  • 66.249.66.153 appears to be Googlebot:
  • -
- -
66.249.66.153 - - [19/Nov/2017:06:26:01 +0000] "GET /handle/10568/2203 HTTP/1.1" 200 6309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
-
- -
    -
  • We know Googlebot is persistent but behaves well, so I guess it was just a coincidence that it came at a time when we had other traffic and server activity
  • -
  • In related news, I see an Atmire update process going for many hours and responsible for hundreds of thousands of log entries (two thirds of all log entries)
  • -
- -
$ wc -l dspace.log.2017-11-19 
-388472 dspace.log.2017-11-19
-$ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19 
-267494
-
- -
    -
  • WTF is this process doing every day, and for so many hours?
  • -
  • In unrelated news, when I was looking at the DSpace logs I saw a bunch of errors like this:
  • -
- -
2017-11-19 03:00:32,806 INFO  org.apache.pdfbox.pdfparser.PDFParser @ Document is encrypted
-2017-11-19 03:00:32,807 ERROR org.apache.pdfbox.filter.FlateFilter @ FlateFilter: stop reading corrupt stream due to a DataFormatException
-
- -
    -
  • It’s been a few days since I enabled the G1GC on DSpace Test and the JVM graph definitely changed:
  • -
- -

Tomcat G1GC

- -

2017-11-20

- -
    -
  • I found an article about JVM tuning that gives some pointers how to enable logging and tools to analyze logs for you
  • -
  • Also notes on rotating GC logs
  • -
  • I decided to switch DSpace Test back to the CMS garbage collector because it is designed for low pauses and high throughput (like G1GC!) and because we haven’t even tried to monitor or tune it
  • -
- -

2017-11-21

- -
    -
  • Magdalena was having problems logging in via LDAP and it seems to be a problem with the CGIAR LDAP server:
  • -
- -
2017-11-21 11:11:09,621 WARN  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2FEC0E5286C17B6694567FFD77C3171C:ip_addr=77.241.141.58:ldap_authentication:type=failed_auth javax.naming.CommunicationException\colon; simple bind failed\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is javax.net.ssl.SSLHandshakeException\colon; sun.security.validator.ValidatorException\colon; PKIX path validation failed\colon; java.security.cert.CertPathValidatorException\colon; validity check failed]
-
- -

2017-11-22

- -
    -
  • Linode sent an alert that the CPU usage on the CGSpace server was very high around 4 to 6 AM
  • -
  • The logs don’t show anything particularly abnormal between those hours:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "22/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    136 31.6.77.23
-    174 68.180.229.254
-    217 66.249.66.91
-    256 157.55.39.79
-    268 54.144.57.183
-    281 207.46.13.137
-    282 207.46.13.36
-    290 207.46.13.23
-    696 66.249.66.90
-    707 104.196.152.243
-
- -
    -
  • I haven’t seen 54.144.57.183 before, it is apparently the CCBot from commoncrawl.org
  • -
  • In other news, it looks like the JVM garbage collection pattern is back to its standard jigsaw pattern after switching back to CMS a few days ago:
  • -
- -

Tomcat JVM with CMS GC

- -

2017-11-23

- -
    -
  • Linode alerted again that CPU usage was high on CGSpace from 4:13 to 6:13 AM
  • -
  • I see a lot of Googlebot (66.249.66.90) in the XMLUI access logs
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "23/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-     88 66.249.66.91
-    140 68.180.229.254
-    155 54.196.2.131
-    182 54.224.164.166
-    301 157.55.39.79
-    315 207.46.13.36
-    331 207.46.13.23
-    358 207.46.13.137
-    565 104.196.152.243
-   1570 66.249.66.90
-
- -
    -
  • … and the usual REST scrapers from CIAT (45.5.184.196) and CCAFS (70.32.83.92):
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "23/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-      5 190.120.6.219
-      6 104.198.9.108
-     14 104.196.152.243
-     21 112.134.150.6
-     22 157.55.39.79
-     22 207.46.13.137
-     23 207.46.13.36
-     26 207.46.13.23
-    942 45.5.184.196
-   3995 70.32.83.92
-
- -
    -
  • These IPs crawling the REST API don’t specify user agents and I’d assume they are creating many Tomcat sessions
  • -
  • I would catch them in nginx to assign a “bot” user agent to them so that the Tomcat Crawler Session Manager valve could deal with them, but they seem to create any really — at least not in the dspace.log:
  • -
- -
$ grep 70.32.83.92 dspace.log.2017-11-23 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-2
-
- -
    -
  • I’m wondering if REST works differently, or just doesn’t log these sessions?
  • -
  • I wonder if they are measurable via JMX MBeans?
  • -
  • I did some tests locally and I don’t see the sessionCounter incrementing after making requests to REST, but it does with XMLUI and OAI
  • -
  • I came across some interesting PostgreSQL tuning advice for SSDs: https://amplitude.engineering/how-a-single-postgresql-config-change-improved-slow-query-performance-by-50x-85593b8991b0
  • -
  • Apparently setting random_page_cost to 1 is “common” advice for systems running PostgreSQL on SSD (the default is 4)
  • -
  • So I deployed this on DSpace Test and will check the Munin PostgreSQL graphs in a few days to see if anything changes
  • -
- -

2017-11-24

- -
    -
  • It’s too early to tell for sure, but after I made the random_page_cost change on DSpace Test’s PostgreSQL yesterday the number of connections dropped drastically:
  • -
- -

PostgreSQL connections after tweak (week)

- -
    -
  • There have been other temporary drops before, but if I look at the past month and actually the whole year, the trend is that connections are four or five times higher on average:
  • -
- -

PostgreSQL connections after tweak (month)

- -
    -
  • I just realized that we’re not logging access requests to other vhosts on CGSpace, so it’s possible I have no idea that we’re getting slammed at 4AM on another domain that we’re just silently redirecting to cgspace.cgiar.org
  • -
  • I’ve enabled logging on the CGIAR Library on CGSpace so I can check to see if there are many requests there
  • -
  • In just a few seconds I already see a dozen requests from Googlebot (of course they get HTTP 301 redirects to cgspace.cgiar.org)
  • -
  • I also noticed that CGNET appears to be monitoring the old domain every few minutes:
  • -
- -
192.156.137.184 - - [24/Nov/2017:20:33:58 +0000] "HEAD / HTTP/1.1" 301 0 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
-
- -
    -
  • I should probably tell CGIAR people to have CGNET stop that
  • -
- -

2017-11-26

- -
    -
  • Linode alerted that CGSpace server was using too much CPU from 5:18 to 7:18 AM
  • -
  • Yet another mystery because the load for all domains looks fine at that time:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "26/Nov/2017:0[567]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    190 66.249.66.83
-    195 104.196.152.243
-    220 40.77.167.82
-    246 207.46.13.137
-    247 68.180.229.254
-    257 157.55.39.214
-    289 66.249.66.91
-    298 157.55.39.206
-    379 66.249.66.70
-   1855 66.249.66.90
-
- -

2017-11-29

- -
    -
  • Linode alerted that CGSpace was using 279% CPU from 6 to 8 AM this morning
  • -
  • About an hour later Uptime Robot said that the server was down
  • -
  • Here are all the top XMLUI and REST users from today:
  • -
- -
# cat /var/log/nginx/rest.log  /var/log/nginx/rest.log.1  /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "29/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    540 66.249.66.83
-    659 40.77.167.36
-    663 157.55.39.214
-    681 157.55.39.206
-    733 157.55.39.158
-    850 66.249.66.70
-   1311 66.249.66.90
-   1340 104.196.152.243
-   4008 70.32.83.92
-   6053 45.5.184.196
-
- -
    -
  • PostgreSQL activity shows 69 connections
  • -
  • I don’t have time to troubleshoot more as I’m in Nairobi working on the HPC so I just restarted Tomcat for now
  • -
  • A few hours later Uptime Robot says the server is down again
  • -
  • I don’t see much activity in the logs but there are 87 PostgreSQL connections
  • -
  • But shit, there were 10,000 unique Tomcat sessions today:
  • -
- -
$ cat dspace.log.2017-11-29 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-10037
-
- -
    -
  • Although maybe that’s not much, as the previous two days had more:
  • -
- -
$ cat dspace.log.2017-11-27 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-12377
-$ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-16984
-
- -
    -
  • I think we just need start increasing the number of allowed PostgreSQL connections instead of fighting this, as it’s the most common source of crashes we have
  • -
  • I will bump DSpace’s db.maxconnections from 60 to 90, and PostgreSQL’s max_connections from 183 to 273 (which is using my loose formula of 90 * webapps + 3)
  • -
  • I really need to figure out how to get DSpace to use a PostgreSQL connection pool
  • -
- -

2017-11-30

- -
    -
  • Linode alerted about high CPU usage on CGSpace again around 6 to 8 AM
  • -
  • Then Uptime Robot said CGSpace was down a few minutes later, but it resolved itself I think (or Tsega restarted Tomcat, I don’t know)
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017-12/index.html b/public/2017-12/index.html deleted file mode 100644 index 891c78eb4..000000000 --- a/public/2017-12/index.html +++ /dev/null @@ -1,1028 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - December, 2017 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

December, 2017

- -
-

2017-12-01

- -
    -
  • Uptime Robot noticed that CGSpace went down
  • -
  • The logs say “Timeout waiting for idle object”
  • -
  • PostgreSQL activity says there are 115 connections currently
  • -
  • The list of connections to XMLUI and REST API for today:
  • -
- -

- -
# cat /var/log/nginx/rest.log  /var/log/nginx/rest.log.1  /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "1/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    763 2.86.122.76
-    907 207.46.13.94
-   1018 157.55.39.206
-   1021 157.55.39.235
-   1407 66.249.66.70
-   1411 104.196.152.243
-   1503 50.116.102.77
-   1805 66.249.66.90
-   4007 70.32.83.92
-   6061 45.5.184.196
-
- -
    -
  • The number of DSpace sessions isn’t even that high:
  • -
- -
$ cat /home/cgspace.cgiar.org/log/dspace.log.2017-12-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-5815
-
- -
    -
  • Connections in the last two hours:
  • -
- -
# cat /var/log/nginx/rest.log  /var/log/nginx/rest.log.1  /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "1/Dec/2017:(09|10)" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail                                                      
-     78 93.160.60.22
-    101 40.77.167.122
-    113 66.249.66.70
-    129 157.55.39.206
-    130 157.55.39.235
-    135 40.77.167.58
-    164 68.180.229.254
-    177 87.100.118.220
-    188 66.249.66.90
-    314 2.86.122.76
-
- -
    -
  • What the fuck is going on?
  • -
  • I’ve never seen this 2.86.122.76 before, it has made quite a few unique Tomcat sessions today:
  • -
- -
$ grep 2.86.122.76 /home/cgspace.cgiar.org/log/dspace.log.2017-12-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-822
-
- -
    -
  • Appears to be some new bot:
  • -
- -
2.86.122.76 - - [01/Dec/2017:09:02:53 +0000] "GET /handle/10568/78444?show=full HTTP/1.1" 200 29307 "-" "Mozilla/3.0 (compatible; Indy Library)"
-
- -
    -
  • I restarted Tomcat and everything came back up
  • -
  • I can add Indy Library to the Tomcat crawler session manager valve but it would be nice if I could simply remap the useragent in nginx
  • -
  • I will also add ‘Drupal’ to the Tomcat crawler session manager valve because there are Drupals out there harvesting and they should be considered as bots
  • -
- -
# cat /var/log/nginx/rest.log  /var/log/nginx/rest.log.1  /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "1/Dec/2017" | grep Drupal | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-      3 54.75.205.145
-      6 70.32.83.92
-     14 2a01:7e00::f03c:91ff:fe18:7396
-     46 2001:4b99:1:1:216:3eff:fe2c:dc6c
-    319 2001:4b99:1:1:216:3eff:fe76:205b
-
- -

2017-12-03

- -
    -
  • Linode alerted that CGSpace’s load was 327.5% from 6 to 8 AM again
  • -
- -

2017-12-04

- -
    -
  • Linode alerted that CGSpace’s load was 255.5% from 8 to 10 AM again
  • -
  • I looked at the Munin stats on DSpace Test (linode02) again to see how the PostgreSQL tweaks from a few weeks ago were holding up:
  • -
- -

DSpace Test PostgreSQL connections month

- -
    -
  • The results look fantastic! So the random_page_cost tweak is massively important for informing the PostgreSQL scheduler that there is no “cost” to accessing random pages, as we’re on an SSD!
  • -
  • I guess we could probably even reduce the PostgreSQL connections in DSpace / PostgreSQL after using this
  • -
  • Run system updates on DSpace Test (linode02) and reboot it
  • -
  • I’m going to enable the PostgreSQL random_page_cost tweak on CGSpace
  • -
  • For reference, here is the past month’s connections:
  • -
- -

CGSpace PostgreSQL connections month

- -

2017-12-05

- - - -

2017-12-06

- -
    -
  • Linode alerted again that the CPU usage on CGSpace was high this morning from 6 to 8 AM
  • -
  • Uptime Robot alerted that the server went down and up around 8:53 this morning
  • -
  • Uptime Robot alerted that CGSpace was down and up again a few minutes later
  • -
  • I don’t see any errors in the DSpace logs but I see in nginx’s access.log that UptimeRobot was returned with HTTP 499 status (Client Closed Request)
  • -
  • Looking at the REST API logs I see some new client IP I haven’t noticed before:
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "6/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-     18 95.108.181.88
-     19 68.180.229.254
-     30 207.46.13.151
-     33 207.46.13.110
-     38 40.77.167.20
-     41 157.55.39.223
-     82 104.196.152.243
-   1529 50.116.102.77
-   4005 70.32.83.92
-   6045 45.5.184.196
-
- -
    -
  • 50.116.102.77 is apparently in the US on websitewelcome.com
  • -
- -

2017-12-07

- -
    -
  • Uptime Robot reported a few times today that CGSpace was down and then up
  • -
  • At one point Tsega restarted Tomcat
  • -
  • I never got any alerts about high load from Linode though…
  • -
  • I looked just now and see that there are 121 PostgreSQL connections!
  • -
  • The top users right now are:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "7/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail 
-    838 40.77.167.11
-    939 66.249.66.223
-   1149 66.249.66.206
-   1316 207.46.13.110
-   1322 207.46.13.151
-   1323 2001:da8:203:2224:c912:1106:d94f:9189
-   1414 157.55.39.223
-   2378 104.196.152.243
-   2662 66.249.66.219
-   5110 124.17.34.60
-
- -
    -
  • We’ve never seen 124.17.34.60 yet, but it’s really hammering us!
  • -
  • Apparently it is from China, and here is one of its user agents:
  • -
- -
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)
-
- -
    -
  • It is responsible for 4,500 Tomcat sessions today alone:
  • -
- -
$ grep 124.17.34.60 /home/cgspace.cgiar.org/log/dspace.log.2017-12-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-4574
-
- -
    -
  • I’ve adjusted the nginx IP mapping that I set up last month to account for 124.17.34.60 and 124.17.34.59 using a regex, as it’s the same bot on the same subnet
  • -
  • I was running the DSpace cleanup task manually and it hit an error:
  • -
- -
$ /home/cgspace.cgiar.org/bin/dspace cleanup -v
-...
-Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
-  Detail: Key (bitstream_id)=(144666) is still referenced from table "bundle".
-
- -
    -
  • The solution is like I discovered in 2017-04, to set the primary_bitstream_id to null:
  • -
- -
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (144666);
-UPDATE 1
-
- -

2017-12-13

- -
    -
  • Linode alerted that CGSpace was using high CPU from 10:13 to 12:13 this morning
  • -
- -

2017-12-16

- -
    -
  • Re-work the XMLUI base theme to allow child themes to override the header logo’s image and link destination: #349
  • -
  • This required a little bit of work to restructure the XSL templates
  • -
  • Optimize PNG and SVG image assets in the CGIAR base theme using pngquant and svgo: #350
  • -
- -

2017-12-17

- -
    -
  • Reboot DSpace Test to get new Linode Linux kernel
  • -
  • Looking at CCAFS bulk import for Magdalena Haman (she originally sent them in November but some of the thumbnails were missing and dates were messed up so she resent them now)
  • -
  • A few issues with the data and thumbnails: - -
      -
    • Her thumbnail files all use capital JPG so I had to rename them to lowercase: rename -fc *.JPG
    • -
    • thumbnail20.jpg is 1.7MB so I have to resize it
    • -
    • I also had to add the .jpg to the thumbnail string in the CSV
    • -
    • The thumbnail11.jpg is missing
    • -
    • The dates are in super long ISO8601 format (from Excel?) like 2016-02-07T00:00:00Z so I converted them to simpler forms in GREL: value.toString("yyyy-MM-dd")
    • -
    • I trimmed the whitespaces in a few fields but it wasn’t many
    • -
    • Rename her thumbnail column to filename, and format it so SAFBuilder adds the files to the thumbnail bundle with this GREL in OpenRefine: value + "__bundle:THUMBNAIL"
    • -
    • Rename dc.identifier.status and dc.identifier.url columns to cg.identifier.status and cg.identifier.url
    • -
    • Item 4 has weird characters in citation, ie: Nagoya et de Trait
    • -
    • Some author names need normalization, ie: Aggarwal, Pramod and Aggarwal, Pramod K.
    • -
    • Something weird going on with duplicate authors that have the same text value, like Berto, Jayson C. and Balmeo, Katherine P.
    • -
    • I will send her feedback on some author names like UNEP and ICRISAT and ask her for the missing thumbnail11.jpg
    • -
  • -
  • I did a test import of the data locally after building with SAFBuilder but for some reason I had to specify the collection (even though the collections were specified in the collection field)
  • -
- -
$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/89338 --source /Users/aorth/Downloads/2016\ bulk\ upload\ thumbnails/SimpleArchiveFormat --mapfile=/tmp/ccafs.map &> /tmp/ccafs.log
-
- -
    -
  • It’s the same on DSpace Test, I can’t import the SAF bundle without specifying the collection:
  • -
- -
$ dspace import --add --eperson=aorth@mjanja.ch --mapfile=/tmp/ccafs.map --source=/tmp/ccafs-2016/SimpleArchiveFormat
-No collections given. Assuming 'collections' file inside item directory
-Adding items from directory: /tmp/ccafs-2016/SimpleArchiveFormat
-Generating mapfile: /tmp/ccafs.map
-Processing collections file: collections
-Adding item from directory item_1
-java.lang.NullPointerException
-        at org.dspace.app.itemimport.ItemImport.addItem(ItemImport.java:865)
-        at org.dspace.app.itemimport.ItemImport.addItems(ItemImport.java:736)
-        at org.dspace.app.itemimport.ItemImport.main(ItemImport.java:498)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-java.lang.NullPointerException
-Started: 1513521856014
-Ended: 1513521858573
-Elapsed time: 2 secs (2559 msecs)
-
- -
    -
  • I even tried to debug it by adding verbose logging to the JAVA_OPTS:
  • -
- -
-Dlog4j.configuration=file:/Users/aorth/dspace/config/log4j-console.properties -Ddspace.log.init.disable=true
-
- -
    -
  • … but the error message was the same, just with more INFO noise around it
  • -
  • For now I’ll import into a collection in DSpace Test but I’m really not sure what’s up with this!
  • -
  • Linode alerted that CGSpace was using high CPU from 4 to 6 PM
  • -
  • The logs for today show the CORE bot (137.108.70.7) being active in XMLUI:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "17/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    671 66.249.66.70
-    885 95.108.181.88
-    904 157.55.39.96
-    923 157.55.39.179
-   1159 207.46.13.107
-   1184 104.196.152.243
-   1230 66.249.66.91
-   1414 68.180.229.254
-   4137 66.249.66.90
-  46401 137.108.70.7
-
- -
    -
  • And then some CIAT bot (45.5.184.196) is actively hitting API endpoints:
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "17/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-     33 68.180.229.254
-     48 157.55.39.96
-     51 157.55.39.179
-     56 207.46.13.107
-    102 104.196.152.243
-    102 66.249.66.90
-    691 137.108.70.7
-   1531 50.116.102.77
-   4014 70.32.83.92
-  11030 45.5.184.196
-
- -
    -
  • That’s probably ok, as I don’t think the REST API connections use up a Tomcat session…
  • -
  • CIP emailed a few days ago to ask about unique IDs for authors and organizations, and if we can provide them via an API
  • -
  • Regarding the import issue above it seems to be a known issue that has a patch in DSpace 5.7: - -
  • -
  • We’re on DSpace 5.5 but there is a one-word fix to the addItem() function here: https://github.com/DSpace/DSpace/pull/1731
  • -
  • I will apply it on our branch but I need to make a note to NOT cherry-pick it when I rebase on to the latest 5.x upstream later
  • -
  • Pull request: #351
  • -
- -

2017-12-18

- -
    -
  • Linode alerted this morning that there was high outbound traffic from 6 to 8 AM
  • -
  • The XMLUI logs show that the CORE bot from last night (137.108.70.7) is very active still:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "18/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    190 207.46.13.146
-    191 197.210.168.174
-    202 86.101.203.216
-    268 157.55.39.134
-    297 66.249.66.91
-    314 213.55.99.121
-    402 66.249.66.90
-    532 68.180.229.254
-    644 104.196.152.243
-  32220 137.108.70.7
-
- -
    -
  • On the API side (REST and OAI) there is still the same CIAT bot (45.5.184.196) from last night making quite a number of requests this morning:
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "18/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-      7 104.198.9.108
-      8 185.29.8.111
-      8 40.77.167.176
-      9 66.249.66.91
-      9 68.180.229.254
-     10 157.55.39.134
-     15 66.249.66.90
-     59 104.196.152.243
-   4014 70.32.83.92
-   8619 45.5.184.196
-
- -
    -
  • I need to keep an eye on this issue because it has nice fixes for reducing the number of database connections in DSpace 5.7: https://jira.duraspace.org/browse/DS-3551
  • -
  • Update text on CGSpace about page to give some tips to developers about using the resources more wisely (#352)
  • -
  • Linode alerted that CGSpace was using 396.3% CPU from 12 to 2 PM
  • -
  • The REST and OAI API logs look pretty much the same as earlier this morning, but there’s a new IP harvesting XMLUI:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "18/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail            
-    360 95.108.181.88
-    477 66.249.66.90
-    526 86.101.203.216
-    691 207.46.13.13
-    698 197.210.168.174
-    819 207.46.13.146
-    878 68.180.229.254
-   1965 104.196.152.243
-  17701 2.86.72.181
-  52532 137.108.70.7
-
- -
    -
  • 2.86.72.181 appears to be from Greece, and has the following user agent:
  • -
- -
Mozilla/3.0 (compatible; Indy Library)
-
- -
    -
  • Surprisingly it seems they are re-using their Tomcat session for all those 17,000 requests:
  • -
- -
$ grep 2.86.72.181 dspace.log.2017-12-18 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l                                                                                          
-1
-
- -
    -
  • I guess there’s nothing I can do to them for now
  • -
  • In other news, I am curious how many PostgreSQL connection pool errors we’ve had in the last month:
  • -
- -
$ grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-1* | grep -v :0
-dspace.log.2017-11-07:15695
-dspace.log.2017-11-08:135
-dspace.log.2017-11-17:1298
-dspace.log.2017-11-26:4160
-dspace.log.2017-11-28:107
-dspace.log.2017-11-29:3972
-dspace.log.2017-12-01:1601
-dspace.log.2017-12-02:1274
-dspace.log.2017-12-07:2769
-
- -
    -
  • I made a small fix to my move-collections.sh script so that it handles the case when a “to” or “from” community doesn’t exist
  • -
  • The script lives here: https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515
  • -
  • Major reorganization of four of CTA’s French collections
  • -
  • Basically moving their items into the English ones, then moving the English ones to the top-level of the CTA community, and deleting the old sub-communities
  • -
  • Move collection 1056851821 from 1056842212 to 1056842211
  • -
  • Move collection 1056851400 from 1056842214 to 1056842211
  • -
  • Move collection 1056856992 from 1056842216 to 1056842211
  • -
  • Move collection 1056842218 from 1056842217 to 1056842211
  • -
  • Export CSV of collection 1056863484 and move items to collection 1056851400
  • -
  • Export CSV of collection 1056864403 and move items to collection 1056856992
  • -
  • Export CSV of collection 1056856994 and move items to collection 1056842218
  • -
  • There are blank lines in this metadata, which causes DSpace to not detect changes in the CSVs
  • -
  • I had to use OpenRefine to remove all columns from the CSV except id and collection, and then update the collection field for the new mappings
  • -
  • Remove empty sub-communities: 1056842212, 1056842214, 1056842216, 1056842217
  • -
  • I was in the middle of applying the metadata imports on CGSpace and the system ran out of PostgreSQL connections…
  • -
  • There were 128 PostgreSQL connections at the time… grrrr.
  • -
  • So I restarted Tomcat 7 and restarted the imports
  • -
  • I assume the PostgreSQL transactions were fine but I will remove the Discovery index for their community and re-run the light-weight indexing to hopefully re-construct everything:
  • -
- -
$ dspace index-discovery -r 10568/42211
-$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
-
- -
    -
  • The PostgreSQL issues are getting out of control, I need to figure out how to enable connection pools in Tomcat!
  • -
- -

2017-12-19

- -
    -
  • Briefly had PostgreSQL connection issues on CGSpace for the millionth time
  • -
  • I’m fucking sick of this!
  • -
  • The connection graph on CGSpace shows shit tons of connections idle
  • -
- -

Idle PostgreSQL connections on CGSpace

- -
    -
  • And I only now just realized that DSpace’s db.maxidle parameter is not seconds, but number of idle connections to allow.
  • -
  • So theoretically, because each webapp has its own pool, this could be 20 per app—so no wonder we have 50 idle connections!
  • -
  • I notice that this number will be set to 10 by default in DSpace 6.1 and 7.0: https://jira.duraspace.org/browse/DS-3564
  • -
  • So I’m going to reduce ours from 20 to 10 and start trying to figure out how the hell to supply a database pool using Tomcat JNDI
  • -
  • I re-deployed the 5_x-prod branch on CGSpace, applied all system updates, and restarted the server
  • -
  • Looking through the dspace.log I see this error:
  • -
- -
2017-12-19 08:17:15,740 ERROR org.dspace.statistics.SolrLogger @ Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
-
- -
    -
  • I don’t have time now to look into this but the Solr sharding has long been an issue!
  • -
  • Looking into using JDBC / JNDI to provide a database pool to DSpace
  • -
  • The DSpace 6.x configuration docs have more notes about setting up the database pool than the 5.x ones (which actually have none!)
  • -
  • First, I uncomment db.jndi in dspace/config/dspace.cfg
  • -
  • Then I create a global Resource in the main Tomcat server.xml (inside GlobalNamingResources):
  • -
- -
<Resource name="jdbc/dspace" auth="Container" type="javax.sql.DataSource"
-	  driverClassName="org.postgresql.Driver"
-	  url="jdbc:postgresql://localhost:5432/dspace"
-	  username="dspace"
-	  password="dspace"
-      initialSize='5'
-      maxActive='50'
-      maxIdle='15'
-      minIdle='5'
-      maxWait='5000'
-      validationQuery='SELECT 1'
-      testOnBorrow='true' />
-
- - - -
<ResourceLink global="jdbc/dspace" name="jdbc/dspace" type="javax.sql.DataSource"/>
-
- -
    -
  • I am not sure why several guides show configuration snippets for server.xml and web application contexts that use a Local and Global jdbc…
  • -
  • When DSpace can’t find the JNDI context (for whatever reason) you will see this in the dspace logs:
  • -
- -
2017-12-19 13:12:08,796 ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspace
-javax.naming.NameNotFoundException: Name [jdbc/dspace] is not bound in this Context. Unable to find [jdbc].
-        at org.apache.naming.NamingContext.lookup(NamingContext.java:825)
-        at org.apache.naming.NamingContext.lookup(NamingContext.java:173)
-        at org.dspace.storage.rdbms.DatabaseManager.initDataSource(DatabaseManager.java:1414)
-        at org.dspace.storage.rdbms.DatabaseManager.initialize(DatabaseManager.java:1331)
-        at org.dspace.storage.rdbms.DatabaseManager.getDataSource(DatabaseManager.java:648)
-        at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:627)
-        at org.dspace.core.Context.init(Context.java:121)
-        at org.dspace.core.Context.<init>(Context.java:95)
-        at org.dspace.app.util.AbstractDSpaceWebapp.register(AbstractDSpaceWebapp.java:79)
-        at org.dspace.app.util.DSpaceContextListener.contextInitialized(DSpaceContextListener.java:128)
-        at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5110)
-        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5633)
-        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
-        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1015)
-        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:991)
-        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:652)
-        at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:712)
-        at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:2002)
-        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
-        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
-        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
-        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
-        at java.lang.Thread.run(Thread.java:748)
-2017-12-19 13:12:08,798 INFO  org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspace
-2017-12-19 13:12:08,798 INFO  org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
-
- -
    -
  • And indeed the Catalina logs show that it failed to set up the JDBC driver:
  • -
- -
org.apache.tomcat.dbcp.dbcp.SQLNestedException: Cannot load JDBC driver class 'org.postgresql.Driver'
-
- -
    -
  • There are several copies of the PostgreSQL driver installed by DSpace:
  • -
- -
$ find ~/dspace/ -iname "postgresql*jdbc*.jar"
-/Users/aorth/dspace/webapps/jspui/WEB-INF/lib/postgresql-9.1-901-1.jdbc4.jar
-/Users/aorth/dspace/webapps/oai/WEB-INF/lib/postgresql-9.1-901-1.jdbc4.jar
-/Users/aorth/dspace/webapps/xmlui/WEB-INF/lib/postgresql-9.1-901-1.jdbc4.jar
-/Users/aorth/dspace/webapps/rest/WEB-INF/lib/postgresql-9.1-901-1.jdbc4.jar
-/Users/aorth/dspace/lib/postgresql-9.1-901-1.jdbc4.jar
-
- -
    -
  • These apparently come from the main DSpace pom.xml:
  • -
- -
<dependency>
-   <groupId>postgresql</groupId>
-   <artifactId>postgresql</artifactId>
-   <version>9.1-901-1.jdbc4</version>
-</dependency>
-
- -
    -
  • So WTF? Let’s try copying one to Tomcat’s lib folder and restarting Tomcat:
  • -
- -
$ cp ~/dspace/lib/postgresql-9.1-901-1.jdbc4.jar /usr/local/opt/tomcat@7/libexec/lib
-
- -
    -
  • Oh that’s fantastic, now at least Tomcat doesn’t print an error during startup so I guess it succeeds to create the JNDI pool
  • -
  • DSpace starts up but I have no idea if it’s using the JNDI configuration because I see this in the logs:
  • -
- -
2017-12-19 13:26:54,271 INFO  org.dspace.storage.rdbms.DatabaseManager @ DBMS is '{}'PostgreSQL
-2017-12-19 13:26:54,277 INFO  org.dspace.storage.rdbms.DatabaseManager @ DBMS driver version is '{}'9.5.10
-2017-12-19 13:26:54,293 INFO  org.dspace.storage.rdbms.DatabaseUtils @ Loading Flyway DB migrations from: filesystem:/Users/aorth/dspace/etc/postgres, classpath:org.dspace.storage.rdbms.sqlmigration.postgres, classpath:org.dspace.storage.rdbms.migration
-2017-12-19 13:26:54,306 INFO  org.flywaydb.core.internal.dbsupport.DbSupportFactory @ Database: jdbc:postgresql://localhost:5432/dspacetest (PostgreSQL 9.5)
-
- -
    -
  • Let’s try again, but this time explicitly blank the PostgreSQL connection parameters in dspace.cfg and see if DSpace starts…
  • -
  • Wow, ok, that works, but having to copy the PostgreSQL JDBC JAR to Tomcat’s lib folder totally blows
  • -
  • Also, it’s likely this is only a problem on my local macOS + Tomcat test environment
  • -
  • Ubuntu’s Tomcat distribution will probably handle this differently
  • -
  • So for reference I have: - -
      -
    • a <Resource> defined globally in server.xml
    • -
    • a <ResourceLink> defined in each web application’s context XML
    • -
    • unset the db.url, db.username, and db.password parameters in dspace.cfg
    • -
    • set the db.jndi in dspace.cfg to the name specified in the web application context
    • -
  • -
  • After adding the Resource to server.xml on Ubuntu I get this in Catalina’s logs:
  • -
- -
SEVERE: Unable to create initial connections of pool.
-java.sql.SQLException: org.postgresql.Driver
-...
-Caused by: java.lang.ClassNotFoundException: org.postgresql.Driver
-
- -
    -
  • The username and password are correct, but maybe I need to copy the fucking lib there too?
  • -
  • I tried installing Ubuntu’s libpostgresql-jdbc-java package but Tomcat still can’t find the class
  • -
  • Let me try to symlink the lib into Tomcat’s libs:
  • -
- -
# ln -sv /usr/share/java/postgresql.jar /usr/share/tomcat7/lib
-
- -
    -
  • Now Tomcat starts but the localhost container has errors:
  • -
- -
SEVERE: Exception sending context initialized event to listener instance of class org.dspace.app.util.DSpaceContextListener
-java.lang.AbstractMethodError: Method org/postgresql/jdbc3/Jdbc3ResultSet.isClosed()Z is abstract
-
- -
    -
  • Could be a version issue or something since the Ubuntu package provides 9.2 and DSpace’s are 9.1…
  • -
  • Let me try to remove it and copy in DSpace’s:
  • -
- -
# rm /usr/share/tomcat7/lib/postgresql.jar
-# cp [dspace]/webapps/xmlui/WEB-INF/lib/postgresql-9.1-901-1.jdbc4.jar /usr/share/tomcat7/lib/
-
- -
    -
  • Wow, I think that actually works…
  • -
  • I wonder if I could get the JDBC driver from postgresql.org instead of relying on the one from the DSpace build: https://jdbc.postgresql.org/
  • -
  • I notice our version is 9.1-901, which isn’t even available anymore! The latest in the archived versions is 9.1-903
  • -
  • Also, since I commented out all the db parameters in DSpace.cfg, how does the command line dspace tool work?
  • -
  • Let’s try the upstream JDBC driver first:
  • -
- -
# rm /usr/share/tomcat7/lib/postgresql-9.1-901-1.jdbc4.jar
-# wget https://jdbc.postgresql.org/download/postgresql-42.1.4.jar -O /usr/share/tomcat7/lib/postgresql-42.1.4.jar
-
- -
    -
  • DSpace command line fails unless db settings are present in dspace.cfg:
  • -
- -
$ dspace database info
-Caught exception:
-java.sql.SQLException: java.lang.ClassNotFoundException: 
-        at org.dspace.storage.rdbms.DataSourceInit.getDatasource(DataSourceInit.java:171)
-        at org.dspace.storage.rdbms.DatabaseManager.initDataSource(DatabaseManager.java:1438)
-        at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:81)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-Caused by: java.lang.ClassNotFoundException: 
-        at java.lang.Class.forName0(Native Method)
-        at java.lang.Class.forName(Class.java:264)
-        at org.dspace.storage.rdbms.DataSourceInit.getDatasource(DataSourceInit.java:41)
-        ... 8 more
-
- -
    -
  • And in the logs:
  • -
- -
2017-12-19 18:26:56,971 ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspace
-javax.naming.NoInitialContextException: Need to specify class name in environment or system property, or as an applet parameter, or in an application resource file:  java.naming.factory.initial
-        at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:662)
-        at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313)
-        at javax.naming.InitialContext.getURLOrDefaultInitCtx(InitialContext.java:350)
-        at javax.naming.InitialContext.lookup(InitialContext.java:417)
-        at org.dspace.storage.rdbms.DatabaseManager.initDataSource(DatabaseManager.java:1413)
-        at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:81)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-2017-12-19 18:26:56,983 INFO  org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspace
-2017-12-19 18:26:56,983 INFO  org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
-2017-12-19 18:26:56,992 WARN  org.dspace.core.ConfigurationManager @ Warning: Number format error in property: db.maxconnections
-2017-12-19 18:26:56,992 WARN  org.dspace.core.ConfigurationManager @ Warning: Number format error in property: db.maxwait
-2017-12-19 18:26:56,993 WARN  org.dspace.core.ConfigurationManager @ Warning: Number format error in property: db.maxidle
-
- -
    -
  • If I add the db values back to dspace.cfg the dspace database info command succeeds but the log still shows errors retrieving the JNDI connection
  • -
  • Perhaps something to report to the dspace-tech mailing list when I finally send my comments
  • -
  • Oh cool! select * from pg_stat_activity shows “PostgreSQL JDBC Driver” for the application name! That’s how you know it’s working!
  • -
  • If you monitor the pg_stat_activity while you run dspace database info you can see that it doesn’t use the JNDI and creates ~9 extra PostgreSQL connections!
  • -
  • And in the middle of all of this Linode sends an alert that CGSpace has high CPU usage from 2 to 4 PM
  • -
- -

2017-12-20

- -
    -
  • The database connection pooling is definitely better!
  • -
- -

PostgreSQL connection pooling on DSpace Test

- -
    -
  • Now there is only one set of idle connections shared among all the web applications, instead of 10+ per application
  • -
  • There are short bursts of connections up to 10, but it generally stays around 5
  • -
  • Test and import 13 records to CGSpace for Abenet:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ dspace import -a -e aorth@mjanja.ch -s /home/aorth/cg_system_20Dec/SimpleArchiveFormat -m systemoffice.map &> systemoffice.log
-
- -
    -
  • The fucking database went from 47 to 72 to 121 connections while I was importing so it stalled.
  • -
  • Since I had to restart Tomcat anyways, I decided to just deploy the new JNDI connection pooling stuff on CGSpace
  • -
  • There was an initial connection storm of 50 PostgreSQL connections, but then it settled down to 7
  • -
  • After that CGSpace came up fine and I was able to import the 13 items just fine:
  • -
- -
$ dspace import -a -e aorth@mjanja.ch -s /home/aorth/cg_system_20Dec/SimpleArchiveFormat -m systemoffice.map &> systemoffice.log
-$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -i 10568/89287
-
- - - -

2017-12-24

- -
    -
  • Linode alerted that CGSpace was using high CPU this morning around 6 AM
  • -
  • I’m playing with reading all of a month’s nginx logs into goaccess:
  • -
- -
# find /var/log/nginx -type f -newermt "2017-12-01" | xargs zcat --force | goaccess --log-format=COMBINED -
-
- -
    -
  • I can see interesting things using this approach, for example: - -
      -
    • 50.116.102.77 checked our status almost 40,000 times so far this month—I think it’s the CGNet uptime tool
    • -
    • Also, we’ve handled 2.9 million requests this month from 172,000 unique IP addresses!
    • -
    • Total bandwidth so far this month is 640GiB
    • -
    • The user that made the most requests so far this month is 45.5.184.196 (267,000 requests)
    • -
  • -
- -

2017-12-25

- -
    -
  • The PostgreSQL connection pooling is much better when using the Tomcat JNDI pool
  • -
  • Here are the Munin stats for the past week on CGSpace:
  • -
- -

CGSpace PostgreSQL connections week

- -

2017-12-29

- -
    -
  • Looking at some old notes for metadata to clean up, I found a few hundred corrections in cg.fulltextstatus and dc.language.iso:
  • -
- -
# update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
-UPDATE 5
-# delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like 'NO';
-DELETE 17
-# update metadatavalue set text_value='en' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(En|English)';
-UPDATE 49
-# update metadatavalue set text_value='fr' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(fre|frn|French)';
-UPDATE 4
-# update metadatavalue set text_value='es' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(Spanish|spa)';
-UPDATE 16
-# update metadatavalue set text_value='vi' where resource_type_id=2 and metadata_field_id=38 and text_value='Vietnamese';
-UPDATE 9
-# update metadatavalue set text_value='ru' where resource_type_id=2 and metadata_field_id=38 and text_value='Ru';
-UPDATE 1
-# update metadatavalue set text_value='in' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(IN|In)';
-UPDATE 5
-# delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(dc.language.iso|CGIAR Challenge Program on Water and Food)';
-DELETE 20
-
- -
    -
  • I need to figure out why we have records with language in because that’s not a language!
  • -
- -

2017-12-30

- -
    -
  • Linode alerted that CGSpace was using 259% CPU from 4 to 6 AM
  • -
  • Uptime Robot noticed that the server went down for 1 minute a few hours later, around 9AM
  • -
  • Here’s the XMLUI logs:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "30/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    637 207.46.13.106
-    641 157.55.39.186
-    715 68.180.229.254
-    924 104.196.152.243
-   1012 66.249.64.95
-   1060 216.244.66.245
-   1120 54.175.208.220
-   1287 66.249.64.93
-   1586 66.249.64.78
-   3653 66.249.64.91
-
- -
    -
  • Looks pretty normal actually, but I don’t know who 54.175.208.220 is
  • -
  • They identify as “com.plumanalytics”, which Google says is associated with Elsevier
  • -
  • They only seem to have used one Tomcat session so that’s good, I guess I don’t need to add them to the Tomcat Crawler Session Manager valve:
  • -
- -
$ grep 54.175.208.220 dspace.log.2017-12-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l          
-1 
-
- -
    -
  • 216.244.66.245 seems to be moz.com’s DotBot
  • -
- -

2017-12-31

- -
    -
  • I finished working on the 42 records for CCAFS after Magdalena sent the remaining corrections
  • -
  • After that I uploaded them to CGSpace:
  • -
- -
$ dspace import -a -e aorth@mjanja.ch -s /home/aorth/2016\ bulk\ upload\ thumbnails/SimpleArchiveFormat -m ccafs.map &> ccafs.log
-
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2017/01/mapping-crazy-duplicate.png b/public/2017/01/mapping-crazy-duplicate.png deleted file mode 100644 index 2caa7e6ca..000000000 Binary files a/public/2017/01/mapping-crazy-duplicate.png and /dev/null differ diff --git a/public/2017/02/cpu-week.png b/public/2017/02/cpu-week.png deleted file mode 100644 index 405aa7070..000000000 Binary files a/public/2017/02/cpu-week.png and /dev/null differ diff --git a/public/2017/02/meminfo_phisical-week.png b/public/2017/02/meminfo_phisical-week.png deleted file mode 100644 index 0d9e85189..000000000 Binary files a/public/2017/02/meminfo_phisical-week.png and /dev/null differ diff --git a/public/2017/03/livestock-theme.png b/public/2017/03/livestock-theme.png deleted file mode 100644 index f8e9fe675..000000000 Binary files a/public/2017/03/livestock-theme.png and /dev/null differ diff --git a/public/2017/03/thumbnail-cmyk.jpg b/public/2017/03/thumbnail-cmyk.jpg deleted file mode 100644 index 4281d2a51..000000000 Binary files a/public/2017/03/thumbnail-cmyk.jpg and /dev/null differ diff --git a/public/2017/03/thumbnail-srgb.jpg b/public/2017/03/thumbnail-srgb.jpg deleted file mode 100644 index 7838a847d..000000000 Binary files a/public/2017/03/thumbnail-srgb.jpg and /dev/null differ diff --git a/public/2017/04/cplace.png b/public/2017/04/cplace.png deleted file mode 100644 index b891cd635..000000000 Binary files a/public/2017/04/cplace.png and /dev/null differ diff --git a/public/2017/04/dc-rights.png b/public/2017/04/dc-rights.png deleted file mode 100644 index 5a511b654..000000000 Binary files a/public/2017/04/dc-rights.png and /dev/null differ diff --git a/public/2017/04/openrefine-flagging-duplicates.png b/public/2017/04/openrefine-flagging-duplicates.png deleted file mode 100644 index 25b729f0a..000000000 Binary files a/public/2017/04/openrefine-flagging-duplicates.png and /dev/null differ diff --git a/public/2017/06/wle-theme-test-a.png b/public/2017/06/wle-theme-test-a.png deleted file mode 100644 index 07cd938e4..000000000 Binary files a/public/2017/06/wle-theme-test-a.png and /dev/null differ diff --git a/public/2017/06/wle-theme-test-b.png b/public/2017/06/wle-theme-test-b.png deleted file mode 100644 index e6cd5c8bd..000000000 Binary files a/public/2017/06/wle-theme-test-b.png and /dev/null differ diff --git a/public/2017/07/lead-author-test.png b/public/2017/07/lead-author-test.png deleted file mode 100644 index 1543979f5..000000000 Binary files a/public/2017/07/lead-author-test.png and /dev/null differ diff --git a/public/2017/08/cifor-oai-harvesting.png b/public/2017/08/cifor-oai-harvesting.png deleted file mode 100644 index 6aa2db071..000000000 Binary files a/public/2017/08/cifor-oai-harvesting.png and /dev/null differ diff --git a/public/2017/08/postgresql-connections-cgspace.png b/public/2017/08/postgresql-connections-cgspace.png deleted file mode 100644 index 982ffe910..000000000 Binary files a/public/2017/08/postgresql-connections-cgspace.png and /dev/null differ diff --git a/public/2017/09/10947-2919-after.jpg b/public/2017/09/10947-2919-after.jpg deleted file mode 100644 index 183c19b02..000000000 Binary files a/public/2017/09/10947-2919-after.jpg and /dev/null differ diff --git a/public/2017/09/10947-2919-before.jpg b/public/2017/09/10947-2919-before.jpg deleted file mode 100644 index 0ba72ea25..000000000 Binary files a/public/2017/09/10947-2919-before.jpg and /dev/null differ diff --git a/public/2017/09/cgspace-memory-week.png b/public/2017/09/cgspace-memory-week.png deleted file mode 100644 index b5710018a..000000000 Binary files a/public/2017/09/cgspace-memory-week.png and /dev/null differ diff --git a/public/2017/09/dspace-test-memory-week.png b/public/2017/09/dspace-test-memory-week.png deleted file mode 100644 index a6cd80e78..000000000 Binary files a/public/2017/09/dspace-test-memory-week.png and /dev/null differ diff --git a/public/2017/10/dspace-thumbnail-box-shadow.png b/public/2017/10/dspace-thumbnail-box-shadow.png deleted file mode 100644 index a39b84adf..000000000 Binary files a/public/2017/10/dspace-thumbnail-box-shadow.png and /dev/null differ diff --git a/public/2017/10/dspace-thumbnail-original.png b/public/2017/10/dspace-thumbnail-original.png deleted file mode 100644 index 1e1a27dff..000000000 Binary files a/public/2017/10/dspace-thumbnail-original.png and /dev/null differ diff --git a/public/2017/10/google-search-console-2.png b/public/2017/10/google-search-console-2.png deleted file mode 100644 index 5e0fbd4df..000000000 Binary files a/public/2017/10/google-search-console-2.png and /dev/null differ diff --git a/public/2017/10/google-search-console.png b/public/2017/10/google-search-console.png deleted file mode 100644 index 16ab3d709..000000000 Binary files a/public/2017/10/google-search-console.png and /dev/null differ diff --git a/public/2017/10/google-search-results.png b/public/2017/10/google-search-results.png deleted file mode 100644 index 4e0c19a8b..000000000 Binary files a/public/2017/10/google-search-results.png and /dev/null differ diff --git a/public/2017/10/search-console-change-address-error.png b/public/2017/10/search-console-change-address-error.png deleted file mode 100644 index 95fe4d9c5..000000000 Binary files a/public/2017/10/search-console-change-address-error.png and /dev/null differ diff --git a/public/2017/11/add-author.png b/public/2017/11/add-author.png deleted file mode 100644 index 26f12875e..000000000 Binary files a/public/2017/11/add-author.png and /dev/null differ diff --git a/public/2017/11/author-lookup.png b/public/2017/11/author-lookup.png deleted file mode 100644 index 5d9d95831..000000000 Binary files a/public/2017/11/author-lookup.png and /dev/null differ diff --git a/public/2017/11/baidu-robotstxt.png b/public/2017/11/baidu-robotstxt.png deleted file mode 100644 index 29cb66563..000000000 Binary files a/public/2017/11/baidu-robotstxt.png and /dev/null differ diff --git a/public/2017/11/jconsole-sessions.png b/public/2017/11/jconsole-sessions.png deleted file mode 100644 index 9af006542..000000000 Binary files a/public/2017/11/jconsole-sessions.png and /dev/null differ diff --git a/public/2017/11/postgres-connections-month.png b/public/2017/11/postgres-connections-month.png deleted file mode 100644 index a7e533174..000000000 Binary files a/public/2017/11/postgres-connections-month.png and /dev/null differ diff --git a/public/2017/11/postgres-connections-week.png b/public/2017/11/postgres-connections-week.png deleted file mode 100644 index 626b3f204..000000000 Binary files a/public/2017/11/postgres-connections-week.png and /dev/null differ diff --git a/public/2017/11/tomcat-jvm-cms.png b/public/2017/11/tomcat-jvm-cms.png deleted file mode 100644 index 095cb6430..000000000 Binary files a/public/2017/11/tomcat-jvm-cms.png and /dev/null differ diff --git a/public/2017/11/tomcat-jvm-g1gc.png b/public/2017/11/tomcat-jvm-g1gc.png deleted file mode 100644 index 724b0419f..000000000 Binary files a/public/2017/11/tomcat-jvm-g1gc.png and /dev/null differ diff --git a/public/2017/12/postgres-connections-cgspace.png b/public/2017/12/postgres-connections-cgspace.png deleted file mode 100644 index 908f3db3d..000000000 Binary files a/public/2017/12/postgres-connections-cgspace.png and /dev/null differ diff --git a/public/2017/12/postgres-connections-month-cgspace-2.png b/public/2017/12/postgres-connections-month-cgspace-2.png deleted file mode 100644 index 11731d504..000000000 Binary files a/public/2017/12/postgres-connections-month-cgspace-2.png and /dev/null differ diff --git a/public/2017/12/postgres-connections-month-cgspace.png b/public/2017/12/postgres-connections-month-cgspace.png deleted file mode 100644 index 27602b450..000000000 Binary files a/public/2017/12/postgres-connections-month-cgspace.png and /dev/null differ diff --git a/public/2017/12/postgres-connections-month.png b/public/2017/12/postgres-connections-month.png deleted file mode 100644 index c7e64a76e..000000000 Binary files a/public/2017/12/postgres-connections-month.png and /dev/null differ diff --git a/public/2017/12/postgres-connections-week-dspacetest.png b/public/2017/12/postgres-connections-week-dspacetest.png deleted file mode 100644 index a79fffe3d..000000000 Binary files a/public/2017/12/postgres-connections-week-dspacetest.png and /dev/null differ diff --git a/public/2018-01/index.html b/public/2018-01/index.html deleted file mode 100644 index 86d33f11f..000000000 --- a/public/2018-01/index.html +++ /dev/null @@ -1,1855 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - January, 2018 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

January, 2018

- -
-

2018-01-02

- -
    -
  • Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
  • -
  • I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
  • -
  • The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
  • -
  • In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
  • -
  • And just before that I see this:
  • -
- -
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-
- -
    -
  • Ah hah! So the pool was actually empty!
  • -
  • I need to increase that, let’s try to bump it up from 50 to 75
  • -
  • After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
  • -
  • I notice this error quite a few times in dspace.log:
  • -
- -
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-
- -
    -
  • And there are many of these errors every day for the past month:
  • -
- -
$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-
- -
    -
  • Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
  • -
- -

- -

2018-01-03

- -
    -
  • I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM
  • -
  • Looks like I need to increase the database pool size again:
  • -
- -
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
-dspace.log.2018-01-01:0
-dspace.log.2018-01-02:1972
-dspace.log.2018-01-03:1909
-
- -
    -
  • For some reason there were a lot of “active” connections last night:
  • -
- -

CGSpace PostgreSQL connections

- -
    -
  • The active IPs in XMLUI are:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    607 40.77.167.141
-    611 2a00:23c3:8c94:7800:392c:a491:e796:9c50
-    663 188.226.169.37
-    759 157.55.39.245
-    887 68.180.229.254
-   1037 157.55.39.175
-   1068 216.244.66.245
-   1495 66.249.64.91
-   1934 104.196.152.243
-   2219 134.155.96.78
-
- -
    -
  • 134.155.96.78 appears to be at the University of Mannheim in Germany
  • -
  • They identify as: Mozilla/5.0 (compatible; heritrix/3.2.0 +http://ifm.uni-mannheim.de)
  • -
  • This appears to be the Internet Archive’s open source bot
  • -
  • They seem to be re-using their Tomcat session so I don’t need to do anything to them just yet:
  • -
- -
$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-2
-
- -
    -
  • The API logs show the normal users:
  • -
- -
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-     32 207.46.13.182
-     38 40.77.167.132
-     38 68.180.229.254
-     43 66.249.64.91
-     46 40.77.167.141
-     49 157.55.39.245
-     79 157.55.39.175
-   1533 50.116.102.77
-   4069 70.32.83.92
-   9355 45.5.184.196
-
- -
    -
  • In other related news I see a sizeable amount of requests coming from python-requests
  • -
  • For example, just in the last day there were 1700!
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests
-1773
-
- -
    -
  • But they come from hundreds of IPs, many of which are 54.x.x.x:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
-      9 54.144.87.92
-      9 54.146.222.143
-      9 54.146.249.249
-      9 54.158.139.206
-      9 54.161.235.224
-      9 54.163.41.19
-      9 54.163.4.51
-      9 54.196.195.107
-      9 54.198.89.134
-      9 54.80.158.113
-     10 54.198.171.98
-     10 54.224.53.185
-     10 54.226.55.207
-     10 54.227.8.195
-     10 54.242.234.189
-     10 54.242.238.209
-     10 54.80.100.66
-     11 54.161.243.121
-     11 54.205.154.178
-     11 54.234.225.84
-     11 54.87.23.173
-     11 54.90.206.30
-     12 54.196.127.62
-     12 54.224.242.208
-     12 54.226.199.163
-     13 54.162.149.249
-     13 54.211.182.255
-     19 50.17.61.150
-     21 54.211.119.107
-    139 164.39.7.62
-
- -
    -
  • I have no idea what these are but they seem to be coming from Amazon…
  • -
  • I guess for now I just have to increase the database connection pool’s max active
  • -
  • It’s currently 75 and normally I’d just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling
  • -
- -

2018-01-04

- -
    -
  • CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night
  • -
  • The XMLUI logs show this activity:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "4/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    968 197.211.63.81
-    981 213.55.99.121
-   1039 66.249.64.93
-   1258 157.55.39.175
-   1273 207.46.13.182
-   1311 157.55.39.191
-   1319 157.55.39.197
-   1775 66.249.64.78
-   2216 104.196.152.243
-   3366 66.249.64.91
-
- -
    -
  • Again we ran out of PostgreSQL database connections, even after bumping the pool max active limit from 50 to 75 to 125 yesterday!
  • -
- -
2018-01-04 07:36:08,089 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
-org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-256] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:125; busy:125; idle:0; lastwait:5000].
-
- -
    -
  • So for this week that is the number one problem!
  • -
- -
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
-dspace.log.2018-01-01:0
-dspace.log.2018-01-02:1972
-dspace.log.2018-01-03:1909
-dspace.log.2018-01-04:1559
-
- -
    -
  • I will just bump the connection limit to 300 because I’m fucking fed up with this shit
  • -
  • Once I get back to Amman I will have to try to create different database pools for different web applications, like recently discussed on the dspace-tech mailing list
  • -
  • Create accounts on CGSpace for two CTA staff km4ard@cta.int and bheenick@cta.int
  • -
- -

2018-01-05

- -
    -
  • Peter said that CGSpac was down last night and Tsega restarted Tomcat
  • -
  • I don’t see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:
  • -
- -
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
-dspace.log.2018-01-01:0
-dspace.log.2018-01-02:1972
-dspace.log.2018-01-03:1909
-dspace.log.2018-01-04:1559
-dspace.log.2018-01-05:0
-
- -
    -
  • Daniel asked for help with their DAGRIS server (linode2328112) that has no disk space
  • -
  • I had a look and there is one Apache 2 log file that is 73GB, with lots of this:
  • -
- -
[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for "9-16-1-RV.doc" in "/home/files/journals/6//articles/9/". Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
-
- -
    -
  • I will delete the log file for now and tell Danny
  • -
  • Also, I’m still seeing a hundred or so of the “ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer” errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is
  • -
  • I will run a full Discovery reindex in the mean time to see if it’s something wrong with the Discovery Solr core
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
-
-real    110m43.985s
-user    15m24.960s
-sys     3m14.890s
-
- - - -

2018-01-06

- -
    -
  • I’m still seeing Solr errors in the DSpace logs even after the full reindex yesterday:
  • -
- -
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered " "]" "] "" at line 1, column 32.
-
- -
    -
  • I posted a message to the dspace-tech mailing list to see if anyone can help
  • -
- -

2018-01-09

- -
    -
  • Advise Sisay about blank lines in some IITA records
  • -
  • Generate a list of author affiliations for Peter to clean up:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
-COPY 4515
-
- -

2018-01-10

- -
    -
  • I looked to see what happened to this year’s Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:
  • -
- -
Moving: 81742 into core statistics-2010
-Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
-org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
-        at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2243)
-        at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-Caused by: org.apache.http.client.ClientProtocolException
-        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
-        ... 10 more
-Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
-        at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
-        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
-        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
-        ... 14 more
-Caused by: java.net.SocketException: Connection reset
-        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
-        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
-        at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:159)
-        at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:179)
-        at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
-        at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
-        at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
-        at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
-        at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
-        at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
-        at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
-        at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
-        at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
-        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
-        at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
-        ... 16 more
-
- -
    -
  • DSpace Test has the same error but with creating the 2017 core:
  • -
- -
Moving: 2243021 into core statistics-2017
-Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2017
-org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2017
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
-        at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2243)
-        at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-Caused by: org.apache.http.client.ClientProtocolException
-        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
-        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
-        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
-        ... 10 more
-
- - - -
$ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&wt=json&indent=true' | grep numFound 
-  "response":{"numFound":48476327,"start":0,"docs":[
-$ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=json&indent=true' | grep numFound
-  "response":{"numFound":34879872,"start":0,"docs":[
-
- -
    -
  • I tested the dspace stats-util -s process on my local machine and it failed the same way
  • -
  • It doesn’t seem to be helpful, but the dspace log shows this:
  • -
- -
2018-01-10 10:51:19,301 INFO  org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
-2018-01-10 10:51:19,301 INFO  org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
-
- - - -
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10 
-0
-
- -
    -
  • The XMLUI logs show quite a bit of activity today:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    951 207.46.13.159
-    954 157.55.39.123
-   1217 95.108.181.88
-   1503 104.196.152.243
-   6455 70.36.107.50
-  11412 70.36.107.190
-  16730 70.36.107.49
-  17386 2607:fa98:40:9:26b6:fdff:feff:1c96
-  21566 2607:fa98:40:9:26b6:fdff:feff:195d
-  45384 2607:fa98:40:9:26b6:fdff:feff:1888
-
- -
    -
  • The user agent for the top six or so IPs are all the same:
  • -
- -
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
-
- -
    -
  • whois says they come from Perfect IP
  • -
  • I’ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:
  • -
- -
$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l                                                                                                                                                                                                  
-49096
-
- -
    -
  • Rather than blocking their IPs, I think I might just add their user agent to the “badbots” zone with Baidu, because they seem to be the only ones using that user agent:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
-/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-   6796 70.36.107.50
-  11870 70.36.107.190
-  17323 70.36.107.49
-  19204 2607:fa98:40:9:26b6:fdff:feff:1c96
-  23401 2607:fa98:40:9:26b6:fdff:feff:195d 
-  47875 2607:fa98:40:9:26b6:fdff:feff:1888
-
- -
    -
  • I added the user agent to nginx’s badbots limit req zone but upon testing the config I got an error:
  • -
- -
# nginx -t
-nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
-nginx: configuration file /etc/nginx/nginx.conf test failed
-
- - - -
# cat /proc/cpuinfo | grep cache_alignment | head -n1
-cache_alignment : 64
-
- -
    -
  • On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx
  • -
  • Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up
  • -
  • So that’s interesting that we’re not out of PostgreSQL connections (current pool maxActive is 300!) but the system is “down” to UptimeRobot and very slow to use
  • -
  • Linode continues to test mitigations for Meltdown and Spectre: https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/
  • -
  • I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)… nope.
  • -
  • It looks like Linode will reboot the KVM hosts later this week, though
  • -
  • Udana from WLE asked if we could give him permission to upload CSVs to CGSpace (which would require super admin access)
  • -
  • Citing concerns with metadata quality, I suggested adding him on DSpace Test first
  • -
  • I opened a ticket with Atmire to ask them about DSpace 5.8 compatibility: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560
  • -
- -

2018-01-11

- -
    -
  • The PostgreSQL and firewall graphs from this week show clearly the load from the new bot from PerfectIP.net yesterday:
  • -
- -

PostgreSQL load -Firewall load

- -
    -
  • Linode rebooted DSpace Test and CGSpace for their host hypervisor kernel updates
  • -
  • Following up with the Solr sharding issue on the dspace-tech mailing list, I noticed this interesting snippet in the Tomcat localhost_access_log at the time of my sharding attempt on my test machine:
  • -
- -
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
-127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-18YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 447
-127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 76
-127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63
-127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?csv.mv.separator=%7C&q=*%3A*&fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&rows=10000&wt=csv HTTP/1.1" 200 2137630
-127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/admin/luke?show=schema&wt=javabin&version=2 HTTP/1.1" 200 16253
-127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156
-
- -
    -
  • The new core is created but when DSpace attempts to POST to it there is an HTTP 409 error
  • -
  • This is apparently a common Solr error code that means “version conflict”: http://yonik.com/solr/optimistic-concurrency/
  • -
  • Looks like that bot from the PerfectIP.net host ended up making about 450,000 requests to XMLUI alone yesterday:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-  21572 70.36.107.50
-  30722 70.36.107.190
-  34566 70.36.107.49
- 101829 2607:fa98:40:9:26b6:fdff:feff:195d
- 111535 2607:fa98:40:9:26b6:fdff:feff:1c96
- 161797 2607:fa98:40:9:26b6:fdff:feff:1888
-
- -
    -
  • Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat’s server.xml:
  • -
- -
<Resource name="jdbc/dspaceWeb" auth="Container" type="javax.sql.DataSource"
-          driverClassName="org.postgresql.Driver"
-          url="jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceWeb"
-          username="dspace"
-          password="dspace"
-          initialSize='5'
-          maxActive='75'
-          maxIdle='15'
-          minIdle='5'
-          maxWait='5000'
-          validationQuery='SELECT 1'
-          testOnBorrow='true' />
-
- -
    -
  • So theoretically I could name each connection “xmlui” or “dspaceWeb” or something meaningful and it would show up in PostgreSQL’s pg_stat_activity table!
  • -
  • This would be super helpful for figuring out where load was coming from (now I wonder if I could figure out how to graph this)
  • -
  • Also, I realized that the db.jndi parameter in dspace.cfg needs to match the name value in your applicaiton’s context—not the global one
  • -
  • Ah hah! Also, I can name the default DSpace connection pool in dspace.cfg as well, like:
  • -
- -
db.url = jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceDefault
-
- -
    -
  • With that it is super easy to see where PostgreSQL connections are coming from in pg_stat_activity
  • -
- -

2018-01-12

- -
    -
  • I’m looking at the DSpace 6.0 Install docs and notice they tweak the number of threads in their Tomcat connector:
  • -
- -
<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
-<Connector port="8080"
-           maxThreads="150"
-           minSpareThreads="25"
-           maxSpareThreads="75"
-           enableLookups="false"
-           redirectPort="8443"
-           acceptCount="100"
-           connectionTimeout="20000"
-           disableUploadTimeout="true"
-           URIEncoding="UTF-8"/>
-
- -
    -
  • In Tomcat 8.5 the maxThreads defaults to 200 which is probably fine, but tweaking minSpareThreads could be good
  • -
  • I don’t see a setting for maxSpareThreads in the docs so that might be an error
  • -
  • Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don’t need to specify that manually anymore: https://tomcat.apache.org/tomcat-8.5-doc/config/http.html
  • -
  • Ooh, I just saw the acceptorThreadCount setting (in Tomcat 7 and 8.5):
  • -
- -
The number of threads to be used to accept connections. Increase this value on a multi CPU machine, although you would never really need more than 2. Also, with a lot of non keep alive connections, you might want to increase this value as well. Default value is 1.
-
- -
    -
  • That could be very interesting
  • -
- -

2018-01-13

- -
    -
  • Still testing DSpace 6.2 on Tomcat 8.5.24
  • -
  • Catalina errors at Tomcat 8.5 startup:
  • -
- -
13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of "35" for "maxActive" property, which is being ignored.
-13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of "5000" for "maxWait" property, which is being ignored.
-
- -
    -
  • I looked in my Tomcat 7.0.82 logs and I don’t see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing
  • -
  • DBCP2 appears to be Tomcat 8.0.x and up according to the Tomcat 8.0 migration guide
  • -
  • I have updated our Ansible infrastructure scripts so that it will be ready whenever we switch to Tomcat 8 (probably with Ubuntu 18.04 later this year)
  • -
  • When I enable the ResourceLink in the ROOT.xml context I get the following error in the Tomcat localhost log:
  • -
- -
13-Jan-2018 14:14:36.017 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.app.util.DSpaceWebappListener]
- java.lang.ExceptionInInitializerError
-        at org.dspace.app.util.AbstractDSpaceWebapp.register(AbstractDSpaceWebapp.java:74)
-        at org.dspace.app.util.DSpaceWebappListener.contextInitialized(DSpaceWebappListener.java:31)
-        at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4745)
-        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5207)
-        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
-        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752)
-        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:728)
-        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
-        at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
-        at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1839)
-        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
-        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
-        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
-        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
-        at java.lang.Thread.run(Thread.java:748)
-Caused by: java.lang.NullPointerException
-        at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:547)
-        at org.dspace.core.Context.<clinit>(Context.java:103)
-        ... 15 more
-
- -
    -
  • Interesting blog post benchmarking Tomcat JDBC vs Apache Commons DBCP2, with configuration snippets: http://www.tugay.biz/2016/07/tomcat-connection-pool-vs-apache.html
  • -
  • The Tomcat vs Apache pool thing is confusing, but apparently we’re using Apache Commons DBCP2 because we don’t specify factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" in our global resource
  • -
  • So at least I know that I’m not looking for documentation or troubleshooting on the Tomcat JDBC pool!
  • -
  • I looked at pg_stat_activity during Tomcat’s startup and I see that the pool created in server.xml is indeed connecting, just that nothing uses it
  • -
  • Also, the fallback connection parameters specified in local.cfg (not dspace.cfg) are used
  • -
  • Shit, this might actually be a DSpace error: https://jira.duraspace.org/browse/DS-3434
  • -
  • I’ll comment on that issue
  • -
- -

2018-01-14

- -
    -
  • Looking at the authors Peter had corrected
  • -
  • Some had multiple and he’s corrected them by adding || in the correction column, but I can’t process those this way so I will just have to flag them and do those manually later
  • -
  • Also, I can flag the values that have “DELETE”
  • -
  • Then I need to facet the correction column on isBlank(value) and not flagged
  • -
- -

2018-01-15

- -
    -
  • Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload
  • -
  • I’m going to apply these ~130 corrections on CGSpace:
  • -
- -
update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
-delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like 'NO';
-update metadatavalue set text_value='en' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(En|English)';
-update metadatavalue set text_value='fr' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(fre|frn|French)';
-update metadatavalue set text_value='es' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(Spanish|spa)';
-update metadatavalue set text_value='vi' where resource_type_id=2 and metadata_field_id=38 and text_value='Vietnamese';
-update metadatavalue set text_value='ru' where resource_type_id=2 and metadata_field_id=38 and text_value='Ru';
-update metadatavalue set text_value='in' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(IN|In)';
-delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(dc.language.iso|CGIAR Challenge Program on Water and Food)';
-
- -
    -
  • Continue proofing Peter’s author corrections that I started yesterday, faceting on non blank, non flagged, and briefly scrolling through the values of the corrections to find encoding errors for French and Spanish names
  • -
- -

OpenRefine Authors

- - - -
$ ./fix-metadata-values.py -i /tmp/2018-01-14-Authors-1300-Corrections.csv -f dc.contributor.author -t correct -m 3 -d dspace-u dspace -p 'fuuu'
-
- -
    -
  • In looking at some of the values to delete or check I found some metadata values that I could not resolve their handle via SQL:
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Tarawali';
- metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
--------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
-           2757936 |        4369 |                 3 | Tarawali   |           |     9 |           |        600 |                2
-(1 row)
-
-dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '4369';
- handle
---------
-(0 rows)
-
- -
    -
  • Even searching in the DSpace advanced search for author equals “Tarawali” produces nothing…
  • -
  • Otherwise, the DSpace 5 SQL Helper Functions provide ds5_item2itemhandle(), which is much easier than my long query above that I always have to go search for
  • -
  • For example, to find the Handle for an item that has the author “Erni”:
  • -
- -
dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Erni';
- metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place |              authority               | confidence | resource_type_id 
--------------------+-------------+-------------------+------------+-----------+-------+--------------------------------------+------------+------------------
-           2612150 |       70308 |                 3 | Erni       |           |     9 | 3fe10c68-6773-49a7-89cc-63eb508723f2 |         -1 |                2
-(1 row)
-dspace=# select ds5_item2itemhandle(70308);
- ds5_item2itemhandle 
----------------------
- 10568/68609
-(1 row)
-
- -
    -
  • Next I apply the author deletions:
  • -
- -
$ ./delete-metadata-values.py -i /tmp/2018-01-14-Authors-5-Deletions.csv -f dc.contributor.author -m 3 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • Now working on the affiliation corrections from Peter:
  • -
- -
$ ./fix-metadata-values.py -i /tmp/2018-01-15-Affiliations-888-Corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
-$ ./delete-metadata-values.py -i /tmp/2018-01-15-Affiliations-11-Deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • Now I made a new list of affiliations for Peter to look through:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where metadata_schema_id = 2 and element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
-COPY 4552
-
- -
    -
  • Looking over the affiliations again I see dozens of CIAT ones with their affiliation formatted like: International Center for Tropical Agriculture (CIAT)
  • -
  • For example, this one is from just last month: https://cgspace.cgiar.org/handle/10568/89930
  • -
  • Our controlled vocabulary has this in the format without the abbreviation: International Center for Tropical Agriculture
  • -
  • So some submitters don’t know to use the controlled vocabulary lookup
  • -
  • Help Sisay with some thumbnails for book chapters in Open Refine and SAFBuilder
  • -
  • CGSpace users were having problems logging in, I think something’s wrong with LDAP because I see this in the logs:
  • -
- -
2018-01-15 12:53:15,810 WARN  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2386749547D03E0AA4EC7E44181A7552:ip_addr=x.x.x.x:ldap_authentication:type=failed_auth javax.naming.AuthenticationException\colon; [LDAP\colon; error code 49 - 80090308\colon; LdapErr\colon; DSID-0C090400, comment\colon; AcceptSecurityContext error, data 775, v1db1^@]
-
- -
    -
  • Looks like we processed 2.9 million requests on CGSpace in 2017-12:
  • -
- -
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Dec/2017"
-2890041
-
-real    0m25.756s
-user    0m28.016s
-sys     0m2.210s
-
- -

2018-01-16

- -
    -
  • Meeting with CGSpace team, a few action items: - -
      -
    • Discuss standardized names for CRPs and centers with ICARDA (don’t wait for CG Core)
    • -
    • Re-send DC rights implementation and forward to everyone so we can move forward with it (without the URI field for now)
    • -
    • Start looking at where I was with the AGROVOC API
    • -
    • Have a controlled vocabulary for CGIAR authors’ names and ORCIDs? Perhaps values like: Orth, Alan S. (0000-0002-1735-7458)
    • -
    • Need to find the metadata field name that ICARDA is using for their ORCIDs
    • -
    • Update text for DSpace version plan on wiki
    • -
    • Come up with an SLA, something like: In return for your contribution we will, to the best of our ability, ensure 99.5% (“two and a half nines”) uptime of CGSpace, ensure data is stored in open formats and safely backed up, follow CG Core metadata standards, …
    • -
    • Add Sisay and Danny to Uptime Robot and allow them to restart Tomcat on CGSpace ✔
    • -
  • -
  • I removed Tsega’s SSH access to the web and DSpace servers, and asked Danny to check whether there is anything he needs from Tsega’s home directories so we can delete the accounts completely
  • -
  • I removed Tsega’s access to Linode dashboard as well
  • -
  • I ended up creating a Jira issue for my db.jndi documentation fix: DS-3803
  • -
  • The DSpace developers said they wanted each pull request to be associated with a Jira issue
  • -
- -

2018-01-17

- -
    -
  • Abenet asked me to proof and upload 54 records for LIVES
  • -
  • A few records were missing countries (even though they’re all from Ethiopia)
  • -
  • Also, there are whitespace issues in many columns, and the items are mapped to the LIVES and ILRI articles collections, not Theses
  • -
  • In any case, importing them like this:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives.map &> lives.log
-
- -
    -
  • And fantastic, before I started the import there were 10 PostgreSQL connections, and then CGSpace crashed during the upload
  • -
  • When I looked there were 210 PostgreSQL connections!
  • -
  • I don’t see any high load in XMLUI or REST/OAI:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "17/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-    381 40.77.167.124
-    403 213.55.99.121
-    431 207.46.13.60
-    445 157.55.39.113
-    445 157.55.39.231
-    449 95.108.181.88
-    453 68.180.229.254
-    593 54.91.48.104
-    757 104.196.152.243
-    776 66.249.66.90
-# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "17/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
-     11 205.201.132.14
-     11 40.77.167.124
-     15 35.226.23.240
-     16 157.55.39.231
-     16 66.249.64.155
-     18 66.249.66.90
-     22 95.108.181.88
-     58 104.196.152.243
-   4106 70.32.83.92
-   9229 45.5.184.196
-
- -
    -
  • But I do see this strange message in the dspace log:
  • -
- -
2018-01-17 07:59:25,856 INFO  org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://localhost:8081: The target server failed to respond
-2018-01-17 07:59:25,856 INFO  org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}->http://localhost:8081
-
- -
    -
  • I have NEVER seen this error before, and there is no error before or after that in DSpace’s solr.log
  • -
  • Tomcat’s catalina.out does show something interesting, though, right at that time:
  • -
- -
[====================>                              ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:02
-[====================>                              ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:11
-[====================>                              ]40% time remaining: 7 hour(s) 14 minute(s) 44 seconds. timestamp: 2018-01-17 07:57:37
-[====================>                              ]40% time remaining: 7 hour(s) 16 minute(s) 5 seconds. timestamp: 2018-01-17 07:57:49
-Exception in thread "http-bio-127.0.0.1-8081-exec-627" java.lang.OutOfMemoryError: Java heap space
-        at org.apache.lucene.util.FixedBitSet.clone(FixedBitSet.java:576)
-        at org.apache.solr.search.BitDocSet.andNot(BitDocSet.java:222)
-        at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1067)
-        at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1557)
-        at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433)
-        at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
-        at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:485)
-        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
-        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
-        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
-        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
-        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:50)
-        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
-        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
-        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
-        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
-        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
-        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
-        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
-        at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
-        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
-        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
-        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
-        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
-        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
-        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:318) 
-        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
-        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
-
- -
    -
  • You can see the timestamp above, which is some Atmire nightly task I think, but I can’t figure out which one
  • -
  • So I restarted Tomcat and tried the import again, which finished very quickly and without errors!
  • -
- -
$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives2.map &> lives2.log
-
- -
    -
  • Looking at the JVM graphs from Munin it does look like the heap ran out of memory (see the blue dip just before the green spike when I restarted Tomcat):
  • -
- -

Tomcat JVM Heap

- - - -
$ docker pull docker.bintray.io/jfrog/artifactory-oss:latest
-$ docker volume create --name artifactory5_data
-$ docker network create dspace-build
-$ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/var/opt/jfrog/artifactory -p 8081:8081 docker.bintray.io/jfrog/artifactory-oss:latest
-
- -
    -
  • Then configure the local maven to use it in settings.xml with the settings from “Set Me Up”: https://www.jfrog.com/confluence/display/RTF/Using+Artifactory
  • -
  • This could be a game changer for testing and running the Docker DSpace image
  • -
  • Wow, I even managed to add the Atmire repository as a remote and map it into the libs-release virtual repository, then tell maven to use it for atmire.com-releases in settings.xml!
  • -
  • Hmm, some maven dependencies for the SWORDv2 web application in DSpace 5.5 are broken:
  • -
- -
[ERROR] Failed to execute goal on project dspace-swordv2: Could not resolve dependencies for project org.dspace:dspace-swordv2:war:5.5: Failed to collect dependencies at org.swordapp:sword2-server:jar:classes:1.0 -> org.apache.abdera:abdera-client:jar:1.1.1 -> org.apache.abdera:abdera-core:jar:1.1.1 -> org.apache.abdera:abdera-i18n:jar:1.1.1 -> org.apache.geronimo.specs:geronimo-activation_1.0.2_spec:jar:1.1: Failed to read artifact descriptor for org.apache.geronimo.specs:geronimo-activation_1.0.2_spec:jar:1.1: Could not find artifact org.apache.geronimo.specs:specs:pom:1.1 in central (http://localhost:8081/artifactory/libs-release) -> [Help 1]
-
- -
    -
  • I never noticed because I build with that web application disabled:
  • -
- -
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-sword,\!dspace-swordv2 clean package
-
- -
    -
  • UptimeRobot said CGSpace went down for a few minutes
  • -
  • I didn’t do anything but it came back up on its own
  • -
  • I don’t see anything unusual in the XMLUI or REST/OAI logs
  • -
  • Now Linode alert says the CPU load is high, sigh
  • -
  • Regarding the heap space error earlier today, it looks like it does happen a few times a week or month (I’m not sure how far these logs go back, as they are not strictly daily):
  • -
- -
# zgrep -c java.lang.OutOfMemoryError /var/log/tomcat7/catalina.out* | grep -v :0
-/var/log/tomcat7/catalina.out:2
-/var/log/tomcat7/catalina.out.10.gz:7
-/var/log/tomcat7/catalina.out.11.gz:1
-/var/log/tomcat7/catalina.out.12.gz:2
-/var/log/tomcat7/catalina.out.15.gz:1
-/var/log/tomcat7/catalina.out.17.gz:2
-/var/log/tomcat7/catalina.out.18.gz:3
-/var/log/tomcat7/catalina.out.20.gz:1
-/var/log/tomcat7/catalina.out.21.gz:4
-/var/log/tomcat7/catalina.out.25.gz:1
-/var/log/tomcat7/catalina.out.28.gz:1
-/var/log/tomcat7/catalina.out.2.gz:6
-/var/log/tomcat7/catalina.out.30.gz:2
-/var/log/tomcat7/catalina.out.31.gz:1
-/var/log/tomcat7/catalina.out.34.gz:1
-/var/log/tomcat7/catalina.out.38.gz:1
-/var/log/tomcat7/catalina.out.39.gz:1
-/var/log/tomcat7/catalina.out.4.gz:3
-/var/log/tomcat7/catalina.out.6.gz:2
-/var/log/tomcat7/catalina.out.7.gz:14
-
- -
    -
  • Overall the heap space usage in the munin graph seems ok, though I usually increase it by 512MB over the average a few times per year as usage grows
  • -
  • But maybe I should increase it by more, like 1024MB, to give a bit more head room
  • -
- -

2018-01-18

- -
    -
  • UptimeRobot said CGSpace was down for 1 minute last night
  • -
  • I don’t see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499
  • -
  • I realize I never did a full re-index after the SQL author and affiliation updates last week, so I should force one now:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
-
- -
    -
  • Maria from Bioversity asked if I could remove the abstracts from all of their Limited Access items in the Bioversity Journal Articles collection
  • -
  • It’s easy enough to do in OpenRefine, but you have to be careful to only get those items that are uploaded into Bioversity’s collection, not the ones that are mapped from others!
  • -
  • Use this GREL in OpenRefine after isolating all the Limited Access items: value.startsWith("10568/35501")
  • -
  • UptimeRobot said CGSpace went down AGAIN and both Sisay and Danny immediately logged in and restarted Tomcat without talking to me or each other!
  • -
- -
Jan 18 07:01:22 linode18 sudo[10805]: dhmichael : TTY=pts/5 ; PWD=/home/dhmichael ; USER=root ; COMMAND=/bin/systemctl restart tomcat7
-Jan 18 07:01:22 linode18 sudo[10805]: pam_unix(sudo:session): session opened for user root by dhmichael(uid=0)
-Jan 18 07:01:22 linode18 systemd[1]: Stopping LSB: Start Tomcat....
-Jan 18 07:01:22 linode18 sudo[10812]: swebshet : TTY=pts/3 ; PWD=/home/swebshet ; USER=root ; COMMAND=/bin/systemctl restart tomcat7
-Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for user root by swebshet(uid=0)
-
- -
    -
  • I had to cancel the Discovery indexing and I’ll have to re-try it another time when the server isn’t so busy (it had already taken two hours and wasn’t even close to being done)
  • -
  • For now I’ve increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs
  • -
- -

2018-01-19

- -
    -
  • Linode alerted and said that the CPU load was 264.1% on CGSpace
  • -
  • Start the Discovery indexing again:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
-
- -
    -
  • Linode alerted again and said that CGSpace was using 301% CPU
  • -
  • Peter emailed to ask why this item doesn’t have an Altmetric badge on CGSpace but does have one on the Altmetric dashboard
  • -
  • Looks like our badge code calls the handle endpoint which doesn’t exist:
  • -
- -
https://api.altmetric.com/v1/handle/10568/88090
-
- -
    -
  • I told Peter we should keep an eye out and try again next week
  • -
- -

2018-01-20

- -
    -
  • Run the authority indexing script on CGSpace and of course it died:
  • -
- -
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-authority 
-Retrieving all data 
-Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer 
-Exception: null
-java.lang.NullPointerException
-        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
-        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
-        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-        at java.lang.reflect.Method.invoke(Method.java:498)
-        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
-        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
- 
-real    7m2.241s
-user    1m33.198s
-sys     0m12.317s
-
- -
    -
  • I tested the abstract cleanups on Bioversity’s Journal Articles collection again that I had started a few days ago
  • -
  • In the end there were 324 items in the collection that were Limited Access, but only 199 had abstracts
  • -
  • I want to document the workflow of adding a production PostgreSQL database to a development instance of DSpace in Docker:
  • -
- -
$ docker exec dspace_db dropdb -U postgres dspace
-$ docker exec dspace_db createdb -U postgres -O dspace --encoding=UNICODE dspace
-$ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace createuser;'
-$ docker cp test.dump dspace_db:/tmp/test.dump
-$ docker exec dspace_db pg_restore -U postgres -d dspace /tmp/test.dump
-$ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace nocreateuser;'
-$ docker exec dspace_db vacuumdb -U postgres dspace
-$ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db:/tmp
-$ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
-
- -

2018-01-22

- -
    -
  • Look over Udana’s CSV of 25 WLE records from last week
  • -
  • I sent him some corrections: - -
      -
    • The file encoding is Windows-1252
    • -
    • There were whitespace issues in the dc.identifier.citation field (spaces at the beginning and end, and multiple spaces in between some words)
    • -
    • Also, the authors listed in the citation need to be in normal format, separated by commas or colons (however you prefer), not with ||
    • -
    • There were spaces in the beginning and end of some cg.identifier.doi fields
    • -
    • Make sure that the cg.coverage.countries field is just countries: ie, no “SOUTH ETHIOPIA” or “EAST AFRICA” (the first should just be ETHIOPIA, the second should be in cg.coverage.region instead)
    • -
    • The current list of regions we use is here: https://github.com/ilri/DSpace/blob/5_x-prod/dspace/config/input-forms.xml#L5162
    • -
    • You have a syntax error in your cg.coverage.regions (extra ||)
    • -
    • The value of dc.identifier.issn should just be the ISSN but you have: eISSN: 1479-487X
    • -
  • -
  • I wrote a quick Python script to use the DSpace REST API to find all collections under a given community
  • -
  • The source code is here: rest-find-collections.py
  • -
  • Peter had said that found a bunch of ILRI collections that were called “untitled”, but I don’t see any:
  • -
- -
$ ./rest-find-collections.py 10568/1 | wc -l
-308
-$ ./rest-find-collections.py 10568/1 | grep -i untitled
-
- -
    -
  • Looking at the Tomcat connector docs I think we really need to increase maxThreads
  • -
  • The default is 200, which can easily be taken up by bots considering that Google and Bing each browse with fifty (50) connections each sometimes!
  • -
  • Before I increase this I want to see if I can measure and graph this, and then benchmark
  • -
  • I’ll probably also increase minSpareThreads to 20 (its default is 10)
  • -
  • I still want to bump up acceptorThreadCount from 1 to 2 as well, as the documentation says this should be increased on multi-core systems
  • -
  • I spent quite a bit of time looking at jvisualvm and jconsole today
  • -
  • Run system updates on DSpace Test and reboot it
  • -
  • I see I can monitor the number of Tomcat threads and some detailed JVM memory stuff if I install munin-plugins-java
  • -
  • I’d still like to get arbitrary mbeans like activeSessions etc, though
  • -
  • I can’t remember if I had to configure the jmx settings in /etc/munin/plugin-conf.d/munin-node or not—I think all I did was re-run the munin-node-configure script and of course enable JMX in Tomcat’s JVM options
  • -
- -

2018-01-23

- -
    -
  • Thinking about generating a jmeter test plan for DSpace, along the lines of Georgetown’s dspace-performance-test
  • -
  • I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:
  • -
- -
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -c -v "/admin"
-56405
-
- -
    -
  • Apparently about 28% of these requests were for bitstreams, 30% for the REST API, and 30% for handles:
  • -
- -
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -Eo "^/(handle|bitstream|rest|oai)/" | sort | uniq -c | sort -n
-     38 /oai/
-  14406 /bitstream/
-  15179 /rest/
-  15191 /handle/
-
- -
    -
  • And 3% were to the homepage or search:
  • -
- -
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -Eo '^/($|open-search|discover)' | sort | uniq -c
-   1050 /
-    413 /discover
-    170 /open-search
-
- -
    -
  • The last 10% or so seem to be for static assets that would be served by nginx anyways:
  • -
- -
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -v bitstream | grep -Eo '\.(js|css|png|jpg|jpeg|php|svg|gif|txt|map)$' | sort | uniq -c | sort -n
-      2 .gif
-      7 .css
-     84 .js
-    433 .php
-    882 .txt
-   2551 .png
-
- -
    -
  • I can definitely design a test plan on this!
  • -
- -

2018-01-24

- -
    -
  • Looking at the REST requests, most of them are to expand all or metadata, but 5% are for retrieving bitstreams:
  • -
- -
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/library-access.log.4.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/rest.log.4.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/oai.log.4.gz /var/log/nginx/error.log.3.gz /var/log/nginx/error.log.4.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -E "^/rest" | grep -Eo "(retrieve|expand=[a-z].*)" | sort | uniq -c | sort -n
-      1 expand=collections
-     16 expand=all&limit=1
-     45 expand=items
-    775 retrieve
-   5675 expand=all
-   8633 expand=metadata
-
- -
    -
  • I finished creating the test plan for DSpace Test and ran it from my Linode with:
  • -
- -
$ jmeter -n -t DSpacePerfTest-dspacetest.cgiar.org.jmx -l 2018-01-24-1.jtl
-
- -
    -
  • Atmire responded to my issue from two weeks ago and said they will start looking into DSpace 5.8 compatibility for CGSpace
  • -
  • I set up a new Arch Linux Linode instance with 8192 MB of RAM and ran the test plan a few times to get a baseline:
  • -
- -
# lscpu
-# lscpu 
-Architecture:        x86_64
-CPU op-mode(s):      32-bit, 64-bit
-Byte Order:          Little Endian
-CPU(s):              4
-On-line CPU(s) list: 0-3
-Thread(s) per core:  1
-Core(s) per socket:  1
-Socket(s):           4
-NUMA node(s):        1
-Vendor ID:           GenuineIntel
-CPU family:          6
-Model:               63
-Model name:          Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
-Stepping:            2
-CPU MHz:             2499.994
-BogoMIPS:            5001.32
-Hypervisor vendor:   KVM
-Virtualization type: full
-L1d cache:           32K
-L1i cache:           32K
-L2 cache:            4096K
-L3 cache:            16384K
-NUMA node0 CPU(s):   0-3
-Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti retpoline fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat
-# free -m
-              total        used        free      shared  buff/cache   available
-Mem:           7970         107        7759           1         103        7771
-Swap:           255           0         255
-# pacman -Syu
-# pacman -S git wget jre8-openjdk-headless mosh htop tmux
-# useradd -m test
-# su - test
-$ git clone -b ilri https://github.com/alanorth/dspace-performance-test.git
-$ wget http://www-us.apache.org/dist//jmeter/binaries/apache-jmeter-3.3.tgz
-$ tar xf apache-jmeter-3.3.tgz
-$ cd apache-jmeter-3.3/bin
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-24-linode5451120-baseline.jtl -j ~/dspace-performance-test/2018-01-24-linode5451120-baseline.log
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-24-linode5451120-baseline2.jtl -j ~/dspace-performance-test/2018-01-24-linode5451120-baseline2.log
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-24-linode5451120-baseline3.jtl -j ~/dspace-performance-test/2018-01-24-linode5451120-baseline3.log
-
- -
    -
  • Then I generated reports for these runs like this:
  • -
- -
$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
-
- -

2018-01-25

- -
    -
  • Run another round of tests on DSpace Test with jmeter after changing Tomcat’s minSpareThreads to 20 (default is 10) and acceptorThreadCount to 2 (default is 1):
  • -
- -
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.log
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.log
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads3.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads3.log
-
- -
    -
  • I changed the parameters back to the baseline ones and switched the Tomcat JVM garbage collector to G1GC and re-ran the tests
  • -
  • JVM options for Tomcat changed from -Xms3072m -Xmx3072m -XX:+UseConcMarkSweepGC to -Xms3072m -Xmx3072m -XX:+UseG1GC -XX:+PerfDisableSharedMem
  • -
- -
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc.log
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.log
-$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.log
-
- -
    -
  • I haven’t had time to look at the results yet
  • -
- -

2018-01-26

- -
    -
  • Peter followed up about some of the points from the Skype meeting last week
  • -
  • Regarding the ORCID field issue, I see ICARDA’s MELSpace is using cg.creator.ID: 0000-0001-9156-7691
  • -
  • I had floated the idea of using a controlled vocabulary with values formatted something like: Orth, Alan S. (0000-0002-1735-7458)
  • -
  • Update PostgreSQL JDBC driver version from 42.1.4 to 42.2.1 on DSpace Test, see: https://jdbc.postgresql.org/
  • -
  • Reboot DSpace Test to get new Linode kernel (Linux 4.14.14-x86_64-linode94)
  • -
  • I am testing my old work on the dc.rights field, I had added a branch for it a few months ago
  • -
  • I added a list of Creative Commons and other licenses in input-forms.xml
  • -
  • The problem is that Peter wanted to use two questions, one for CG centers and one for other, but using the same metadata value, which isn’t possible (?)
  • -
  • So I used some creativity and made several fields display values, but not store any, ie:
  • -
- -
<pair>
-  <displayed-value>For products published by another party:</displayed-value>
-  <stored-value></stored-value>
-</pair>
-
- -
    -
  • I was worried that if a user selected this field for some reason that DSpace would store an empty value, but it simply doesn’t register that as a valid option:
  • -
- -

Rights

- -
    -
  • I submitted a test item with ORCiDs and dc.rights from a controlled vocabulary on DSpace Test: https://dspacetest.cgiar.org/handle/10568/97703
  • -
  • I will send it to Peter to check and give feedback (ie, about the ORCiD field name as well as allowing users to add ORCiDs manually or not)
  • -
- -

2018-01-28

- -
    -
  • Assist Udana from WLE again to proof his 25 records and upload them to DSpace Test
  • -
  • I am playing with the startStopThreads="0" parameter in Tomcat <Engine> and <Host> configuration
  • -
  • It reduces the start up time of Catalina by using multiple threads to start web applications in parallel
  • -
  • On my local test machine the startup time went from 70 to 30 seconds
  • -
  • See: https://tomcat.apache.org/tomcat-7.0-doc/config/host.html
  • -
- -

2018-01-29

- -
    -
  • CGSpace went down this morning for a few minutes, according to UptimeRobot
  • -
  • Looking at the DSpace logs I see this error happened just before UptimeRobot noticed it going down:
  • -
- -
2018-01-29 05:30:22,226 INFO  org.dspace.usage.LoggerUsageEventListener @ anonymous:session_id=3775D4125D28EF0C691B08345D905141:ip_addr=68.180.229.254:view_item:handle=10568/71890
-2018-01-29 05:30:22,322 ERROR org.dspace.app.xmlui.aspect.discovery.AbstractSearch @ org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1994+TO+1999]': Encountered " "]" "] "" at line 1, column 32.
-Was expecting one of:
-    "TO" ...
-    <RANGE_QUOTED> ...
-    <RANGE_GOOP> ...
-    
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1994+TO+1999]': Encountered " "]" "] "" at line 1, column 32.
-Was expecting one of:
-    "TO" ...
-    <RANGE_QUOTED> ...
-    <RANGE_GOOP> ...
-
- -
    -
  • So is this an error caused by this particular client (which happens to be Yahoo! Slurp)?
  • -
  • I see a few dozen HTTP 499 errors in the nginx access log for a few minutes before this happened, but HTTP 499 is just when nginx says that the client closed the request early
  • -
  • Perhaps this from the nginx error log is relevant?
  • -
- -
2018/01/29 05:26:34 [warn] 26895#26895: *944759 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/16/0000026166 while reading upstream, client: 180.76.15.34, server: cgspace.cgiar.org, request: "GET /bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12 HTTP/1.1", upstream: "http://127.0.0.1:8443/bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12", host: "cgspace.cgiar.org"
-
- - - -
# awk '($9 ~ /200/) { i++;sum+=$10;max=$10>max?$10:max; } END { printf("Maximum: %d\nAverage: %d\n",max,i?sum/i:0); }' /var/log/nginx/access.log
-Maximum: 2771268
-Average: 210483
-
- -
    -
  • I guess responses that don’t fit in RAM get saved to disk (a default of 1024M), so this is definitely not the issue here, and that warning is totally unrelated
  • -
  • My best guess is that the Solr search error is related somehow but I can’t figure it out
  • -
  • We definitely have enough database connections, as I haven’t seen a pool error in weeks:
  • -
- -
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-2*
-dspace.log.2018-01-20:0
-dspace.log.2018-01-21:0
-dspace.log.2018-01-22:0
-dspace.log.2018-01-23:0
-dspace.log.2018-01-24:0
-dspace.log.2018-01-25:0
-dspace.log.2018-01-26:0
-dspace.log.2018-01-27:0
-dspace.log.2018-01-28:0
-dspace.log.2018-01-29:0
-
- -
    -
  • Adam Hunt from WLE complained that pages take “1-2 minutes” to load each, from France and Sri Lanka
  • -
  • I asked him which particular pages, as right now pages load in 2 or 3 seconds for me
  • -
  • UptimeRobot said CGSpace went down again, and I looked at PostgreSQL and saw 211 active database connections
  • -
  • If it’s not memory and it’s not database, it’s gotta be Tomcat threads, seeing as the default maxThreads is 200 anyways, it actually makes sense
  • -
  • I decided to change the Tomcat thread settings on CGSpace: - -
  • -
  • Looks like I only enabled the new thread stuff on the connector used internally by Solr, so I probably need to match that by increasing them on the other connector that nginx proxies to
  • -
  • Jesus Christ I need to fucking fix the Munin monitoring so that I can tell how many fucking threads I have running
  • -
  • Wow, so apparently you need to specify which connector to check if you want any of the Munin Tomcat plugins besides “tomcat_jvm” to work (the connector name can be seen in the Catalina logs)
  • -
  • I modified /etc/munin/plugin-conf.d/tomcat to add the connector (with surrounding quotes!) and now the other plugins work (obviously the credentials are incorrect):
  • -
- -
[tomcat_*]
-    env.host 127.0.0.1
-    env.port 8081
-    env.connector "http-bio-127.0.0.1-8443"
-    env.user munin
-    env.password munin
-
- -
    -
  • For example, I can see the threads:
  • -
- -
# munin-run tomcat_threads
-busy.value 0
-idle.value 20
-max.value 400
-
- -
    -
  • Apparently you can’t monitor more than one connector, so I guess the most important to monitor would be the one that nginx is sending stuff to
  • -
  • So for now I think I’ll just monitor these and skip trying to configure the jmx plugins
  • -
  • Although following the logic of _/usr/share/munin/plugins/jmx_tomcatdbpools could be useful for getting the active Tomcat sessions
  • -
  • From debugging the jmx_tomcat_db_pools script from the munin-plugins-java package, I see that this is how you call arbitrary mbeans:
  • -
- -
# port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=DataSource,class=javax.sql.DataSource,name=* maxActive
-Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace"  maxActive       300
-
- - - -
[===================>                               ]38% time remaining: 5 hour(s) 21 minute(s) 47 seconds. timestamp: 2018-01-29 06:25:16
-
- -
    -
  • There are millions of these status lines, for example in just this one log file:
  • -
- -
# zgrep -c "time remaining" /var/log/tomcat7/catalina.out.1.gz
-1084741
-
- - - -

2018-01-31

- -
    -
  • UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs
  • -
  • PostgreSQL activity shows 222 database connections
  • -
  • Now PostgreSQL activity shows 265 database connections!
  • -
  • I don’t see any errors anywhere…
  • -
  • Now PostgreSQL activity shows 308 connections!
  • -
  • Well this is interesting, there are 400 Tomcat threads busy:
  • -
- -
# munin-run tomcat_threads
-busy.value 400
-idle.value 0
-max.value 400
-
- -
    -
  • And wow, we finally exhausted the database connections, from dspace.log:
  • -
- -
2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - 
-org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
-
- -
    -
  • Now even the nightly Atmire background thing is getting HTTP 500 error:
  • -
- -
Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
-SEVERE: Mapped exception to response: 500 (Internal Server Error)
-javax.ws.rs.WebApplicationException
-
- -
    -
  • For now I will restart Tomcat to clear this shit and bring the site back up
  • -
  • The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:
  • -
- -
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
-     67 66.249.66.70
-     70 207.46.13.12
-     71 197.210.168.174
-     83 207.46.13.13
-     85 157.55.39.79
-     89 207.46.13.14
-    123 68.180.228.157
-    198 66.249.66.90
-    219 41.204.190.40
-    255 2405:204:a208:1e12:132:2a8e:ad28:46c0
-# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
-      2 65.55.210.187
-      2 66.249.66.90
-      3 157.55.39.79
-      4 197.232.39.92
-      4 34.216.252.127
-      6 104.196.152.243
-      6 213.55.85.89
-     15 122.52.115.13
-     16 213.55.107.186
-    596 45.5.184.196
-
- -
    -
  • This looks reasonable to me, so I have no idea why we ran out of Tomcat threads
  • -
- -

Tomcat threads

- -
    -
  • We need to start graphing the Tomcat sessions as well, though that requires JMX
  • -
  • Also, I wonder if I could disable the nightly Atmire thing
  • -
  • God, I don’t know where this load is coming from
  • -
  • Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200% for almost a whole day:
  • -
- -

CPU usage week

- -
    -
  • I should make separate database pools for the web applications and the API applications like REST and OAI
  • -
  • Ok, so this is interesting: I figured out how to get the MBean path to query Tomcat’s activeSessions from JMX (using munin-plugins-java):
  • -
- -
# port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
-Catalina:type=Manager,context=/,host=localhost  activeSessions  8
-
- -
    -
  • If you connect to Tomcat in jvisualvm it’s pretty obvious when you hover over the elements
  • -
- -

MBeans in JVisualVM

- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2018-02/index.html b/public/2018-02/index.html deleted file mode 100644 index b1f9779ff..000000000 --- a/public/2018-02/index.html +++ /dev/null @@ -1,636 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - February, 2018 | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

February, 2018

- -
-

2018-02-01

- -
    -
  • Peter gave feedback on the dc.rights proof of concept that I had sent him last week
  • -
  • We don’t need to distinguish between internal and external works, so that makes it just a simple list
  • -
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • -
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • -
- -

- -

DSpace Sessions

- -
    -
  • Run all system updates and reboot DSpace Test
  • -
  • Wow, I packaged up the jmx_dspace_sessions stuff in the Ansible infrastructure scripts and deployed it on CGSpace and it totally works:
  • -
- -
# munin-run jmx_dspace_sessions
-v_.value 223
-v_jspui.value 1
-v_oai.value 0
-
- -

2018-02-03

- - - -
$ ./delete-metadata-values.py -i /tmp/2018-02-03-Affiliations-12-deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
-$ ./fix-metadata-values.py -i /tmp/2018-02-03-Affiliations-1116-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • Then I started a full Discovery reindex:
  • -
- -
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
-
-real    96m39.823s
-user    14m10.975s
-sys     2m29.088s
-
- -
    -
  • Generate a new list of affiliations for Peter to sort through:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
-COPY 3723
-
- -
    -
  • Oh, and it looks like we processed over 3.1 million requests in January, up from 2.9 million in December:
  • -
- -
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2018"
-3126109
-
-real    0m23.839s
-user    0m27.225s
-sys     0m1.905s
-
- -

2018-02-05

- -
    -
  • Toying with correcting authors with trailing spaces via PostgreSQL:
  • -
- -
dspace=# update metadatavalue set text_value=REGEXP_REPLACE(text_value, '\s+$' , '') where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*?\s+$';
-UPDATE 20
-
- -
    -
  • I tried the TRIM(TRAILING from text_value) function and it said it changed 20 items but the spaces didn’t go away
  • -
  • This is on a fresh import of the CGSpace database, but when I tried to apply it on CGSpace there were no changes detected. Weird.
  • -
  • Anyways, Peter wants a new list of authors to clean up, so I exported another CSV:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors-2018-02-05.csv with csv;
-COPY 55630
-
- -

2018-02-06

- -
    -
  • UptimeRobot says CGSpace is down this morning around 9:15
  • -
  • I see 308 PostgreSQL connections in pg_stat_activity
  • -
  • The usage otherwise seemed low for REST/OAI as well as XMLUI in the last hour:
  • -
- -
# date
-Tue Feb  6 09:30:32 UTC 2018
-# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "6/Feb/2018:(08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
-      2 223.185.41.40
-      2 66.249.64.14
-      2 77.246.52.40
-      4 157.55.39.82
-      4 193.205.105.8
-      5 207.46.13.63
-      5 207.46.13.64
-      6 154.68.16.34
-      7 207.46.13.66
-   1548 50.116.102.77
-# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "6/Feb/2018:(08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
-     77 213.55.99.121
-     86 66.249.64.14
-    101 104.196.152.243
-    103 207.46.13.64
-    118 157.55.39.82
-    133 207.46.13.66
-    136 207.46.13.63
-    156 68.180.228.157
-    295 197.210.168.174
-    752 144.76.64.79
-
- -
    -
  • I did notice in /var/log/tomcat7/catalina.out that Atmire’s update thing was running though
  • -
  • So I restarted Tomcat and now everything is fine
  • -
  • Next time I see that many database connections I need to save the output so I can analyze it later
  • -
  • I’m going to re-schedule the taskUpdateSolrStatsMetadata task as Bram detailed in ticket 566 to see if it makes CGSpace stop crashing every morning
  • -
  • If I move the task from 3AM to 3PM, deally CGSpace will stop crashing in the morning, or start crashing ~12 hours later
  • -
  • Eventually Atmire has said that there will be a fix for this high load caused by their script, but it will come with the 5.8 compatability they are already working on
  • -
  • I re-deployed CGSpace with the new task time of 3PM, ran all system updates, and restarted the server
  • -
  • Also, I changed the name of the DSpace fallback pool on DSpace Test and CGSpace to be called ‘dspaceCli’ so that I can distinguish it in pg_stat_activity
  • -
  • I implemented some changes to the pooling in the Ansible infrastructure scripts so that each DSpace web application can use its own pool (web, api, and solr)
  • -
  • Each pool uses its own name and hopefully this should help me figure out which one is using too many connections next time CGSpace goes down
  • -
  • Also, this will mean that when a search bot comes along and hammers the XMLUI, the REST and OAI applications will be fine
  • -
  • I’m not actually sure if the Solr web application uses the database though, so I’ll have to check later and remove it if necessary
  • -
  • I deployed the changes on DSpace Test only for now, so I will monitor and make them on CGSpace later this week
  • -
- -

2018-02-07

- -
    -
  • Abenet wrote to ask a question about the ORCiD lookup not working for one CIAT user on CGSpace
  • -
  • I tried on DSpace Test and indeed the lookup just doesn’t work!
  • -
  • The ORCiD code in DSpace appears to be using http://pub.orcid.org/, but when I go there in the browser it redirects me to https://pub.orcid.org/v2.0/
  • -
  • According to the announcement the v1 API was moved from http://pub.orcid.org/ to https://pub.orcid.org/v1.2 until March 1st when it will be discontinued for good
  • -
  • But the old URL is hard coded in DSpace and it doesn’t work anyways, because it currently redirects you to https://pub.orcid.org/v2.0/v1.2
  • -
  • So I guess we have to disable that shit once and for all and switch to a controlled vocabulary
  • -
  • CGSpace crashed again, this time around Wed Feb 7 11:20:28 UTC 2018
  • -
  • I took a few snapshots of the PostgreSQL activity at the time and as the minutes went on and the connections were very high at first but reduced on their own:
  • -
- -
$ psql -c 'select * from pg_stat_activity' > /tmp/pg_stat_activity.txt
-$ grep -c 'PostgreSQL JDBC' /tmp/pg_stat_activity*
-/tmp/pg_stat_activity1.txt:300
-/tmp/pg_stat_activity2.txt:272
-/tmp/pg_stat_activity3.txt:168
-/tmp/pg_stat_activity4.txt:5
-/tmp/pg_stat_activity5.txt:6
-
- -
    -
  • Interestingly, all of those 751 connections were idle!
  • -
- -
$ grep "PostgreSQL JDBC" /tmp/pg_stat_activity* | grep -c idle
-751
-
- -
    -
  • Since I was restarting Tomcat anyways, I decided to deploy the changes to create two different pools for web and API apps
  • -
  • Looking the Munin graphs, I can see that there were almost double the normal number of DSpace sessions at the time of the crash (and also yesterday!):
  • -
- -

DSpace Sessions

- -
    -
  • Indeed it seems like there were over 1800 sessions today around the hours of 10 and 11 AM:
  • -
- -
$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-1828
-
- -
    -
  • CGSpace went down again a few hours later, and now the connections to the dspaceWeb pool are maxed at 250 (the new limit I imposed with the new separate pool scheme)
  • -
  • What’s interesting is that the DSpace log says the connections are all busy:
  • -
- -
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-328] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
-
- -
    -
  • … but in PostgreSQL I see them idle or idle in transaction:
  • -
- -
$ psql -c 'select * from pg_stat_activity' | grep -c dspaceWeb
-250
-$ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c idle
-250
-$ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle in transaction"
-187
-
- -
    -
  • What the fuck, does DSpace think all connections are busy?
  • -
  • I suspect these are issues with abandoned connections or maybe a leak, so I’m going to try adding the removeAbandoned='true' parameter which is apparently off by default
  • -
  • I will try testOnReturn='true' too, just to add more validation, because I’m fucking grasping at straws
  • -
  • Also, WTF, there was a heap space error randomly in catalina.out:
  • -
- -
Wed Feb 07 15:01:54 UTC 2018 | Query:containerItem:91917 AND type:2
-Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfMemoryError: Java heap space
-
- -
    -
  • I’m trying to find a way to determine what was using all those Tomcat sessions, but parsing the DSpace log is hard because some IPs are IPv6, which contain colons!
  • -
  • Looking at the first crash this morning around 11, I see these IPv4 addresses making requests around 10 and 11AM:
  • -
- -
$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20
-     34 ip_addr=46.229.168.67
-     34 ip_addr=46.229.168.73
-     37 ip_addr=46.229.168.76
-     40 ip_addr=34.232.65.41
-     41 ip_addr=46.229.168.71
-     44 ip_addr=197.210.168.174
-     55 ip_addr=181.137.2.214
-     55 ip_addr=213.55.99.121
-     58 ip_addr=46.229.168.65
-     64 ip_addr=66.249.66.91
-     67 ip_addr=66.249.66.90
-     71 ip_addr=207.46.13.54
-     78 ip_addr=130.82.1.40
-    104 ip_addr=40.77.167.36
-    151 ip_addr=68.180.228.157
-    174 ip_addr=207.46.13.135
-    194 ip_addr=54.83.138.123
-    198 ip_addr=40.77.167.62
-    210 ip_addr=207.46.13.71
-    214 ip_addr=104.196.152.243
-
- -
    -
  • These IPs made thousands of sessions today:
  • -
- -
$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-530
-$ grep 207.46.13.71 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-859
-$ grep 40.77.167.62 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-610
-$ grep 54.83.138.123 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-8
-$ grep 207.46.13.135 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-826
-$ grep 68.180.228.157 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-727
-$ grep 40.77.167.36 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-181
-$ grep 130.82.1.40 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-24
-$ grep 207.46.13.54 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-166
-$ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
-992
-
-
- -
    -
  • Let’s investigate who these IPs belong to: - -
      -
    • 104.196.152.243 is CIAT, which is already marked as a bot via nginx!
    • -
    • 207.46.13.71 is Bing, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
    • 40.77.167.62 is Bing, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
    • 207.46.13.135 is Bing, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
    • 68.180.228.157 is Yahoo, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
    • 40.77.167.36 is Bing, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
    • 207.46.13.54 is Bing, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
    • 46.229.168.x is Semrush, which is already marked as a bot in Tomcat’s Crawler Session Manager Valve!
    • -
  • -
  • Nice, so these are all known bots that are already crammed into one session by Tomcat’s Crawler Session Manager Valve.
  • -
  • What in the actual fuck, why is our load doing this? It’s gotta be something fucked up with the database pool being “busy” but everything is fucking idle
  • -
  • One that I should probably add in nginx is 54.83.138.123, which is apparently the following user agent:
  • -
- -
BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
-
- -
    -
  • This one makes two thousand requests per day or so recently:
  • -
- -
# grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1
-/var/log/nginx/access.log:1925
-/var/log/nginx/access.log.1:2029
-
- -
    -
  • And they have 30 IPs, so fuck that shit I’m going to add them to the Tomcat Crawler Session Manager Valve nowwww
  • -
  • Lots of discussions on the dspace-tech mailing list over the last few years about leaky transactions being a known problem with DSpace
  • -
  • Helix84 recommends restarting PostgreSQL instead of Tomcat because it restarts quicker
  • -
  • This is how the connections looked when it crashed this afternoon:
  • -
- -
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
-      5 dspaceApi
-    290 dspaceWeb
-
- -
    -
  • This is how it is right now:
  • -
- -
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
-      5 dspaceApi
-      5 dspaceWeb
-
- -
    -
  • So is this just some fucked up XMLUI database leaking?
  • -
  • I notice there is an issue (that I’ve probably noticed before) on the Jira tracker about this that was fixed in DSpace 5.7: https://jira.duraspace.org/browse/DS-3551
  • -
  • I seriously doubt this leaking shit is fixed for sure, but I’m gonna cherry-pick all those commits and try them on DSpace Test and probably even CGSpace because I’m fed up with this shit
  • -
  • I cherry-picked all the commits for DS-3551 but it won’t build on our current DSpace 5.5!
  • -
  • I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle
  • -
- -

2018-02-10

- -
    -
  • I tried to disable ORCID lookups but keep the existing authorities
  • -
  • This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897
  • -
  • Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn’t show up on the item
  • -
  • Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error: -
  • -
- -
Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
-
- -
    -
  • If I change choices.presentation to suggest it give this error:
  • -
- -
xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
-
- -
    -
  • So I don’t think we can disable the ORCID lookup function and keep the ORCID badges
  • -
- -

2018-02-11

- -
    -
  • Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: 1056890735
  • -
- -

Weird thumbnail

- -
    -
  • I downloaded the PDF and manually generated a thumbnail with ImageMagick and it looked better:
  • -
- -
$ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccprofiles/default_cmyk.icc -thumbnail 600x600 -flatten -profile /usr/local/share/ghostscript/9.22/iccprofiles/default_rgb.icc CCAFS_WP_223.jpg
-
- -

Manual thumbnail

- -
    -
  • Peter sent me corrected author names last week but the file encoding is messed up:
  • -
- -
$ isutf8 authors-2018-02-05.csv
-authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
-
- -
    -
  • The isutf8 program comes from moreutils
  • -
  • Line 100 contains: Galiè, Alessandra
  • -
  • In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use pip install psycopg2-binary
  • -
  • See: http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/
  • -
  • I updated my fix-metadata-values.py and delete-metadata-values.py scripts on the scripts page: https://github.com/ilri/DSpace/wiki/Scripts
  • -
  • I ran the 342 author corrections (after trimming whitespace and excluding those with || and other syntax errors) on CGSpace:
  • -
- -
$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
-
- -
    -
  • Then I ran a full Discovery re-indexing:
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
-$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
-
- - - -
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
-   text_value    |              authority               | confidence 
------------------+--------------------------------------+------------
- Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 |        600
- Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 |         -1
- Duncan, Alan J. |                                      |         -1
- Duncan, Alan    | a6486522-b08a-4f7a-84f9-3a73ce56034d |        600
- Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 |         -1
- Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d |         -1
- Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 |         -1
- Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d |        600
-(8 rows)
-
-dspace=# begin;
-dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
-UPDATE 216
-dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
-  text_value  |              authority               | confidence 
---------------+--------------------------------------+------------
- Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d |        600
-(1 row)
-dspace=# commit;
-
- -
    -
  • Run all system updates on DSpace Test (linode02) and reboot it
  • -
  • I wrote a Python script (resolve-orcids-from-solr.py) using SolrClient to parse the Solr authority cache for ORCID IDs
  • -
  • We currently have 1562 authority records with ORCID IDs, and 624 unique IDs
  • -
  • We can use this to build a controlled vocabulary of ORCID IDs for new item submissions
  • -
  • I don’t know how to add ORCID IDs to existing items yet… some more querying of PostgreSQL for authority values perhaps?
  • -
  • I added the script to the ILRI DSpace wiki on GitHub
  • -
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/2018/01/cpu-week.png b/public/2018/01/cpu-week.png deleted file mode 100644 index 2f5870268..000000000 Binary files a/public/2018/01/cpu-week.png and /dev/null differ diff --git a/public/2018/01/dc-rights-submission.png b/public/2018/01/dc-rights-submission.png deleted file mode 100644 index f811addd1..000000000 Binary files a/public/2018/01/dc-rights-submission.png and /dev/null differ diff --git a/public/2018/01/firewall-perfectip.png b/public/2018/01/firewall-perfectip.png deleted file mode 100644 index 876d8612b..000000000 Binary files a/public/2018/01/firewall-perfectip.png and /dev/null differ diff --git a/public/2018/01/jvisualvm-mbeans-path.png b/public/2018/01/jvisualvm-mbeans-path.png deleted file mode 100644 index 9159b34b5..000000000 Binary files a/public/2018/01/jvisualvm-mbeans-path.png and /dev/null differ diff --git a/public/2018/01/openrefine-authors.png b/public/2018/01/openrefine-authors.png deleted file mode 100644 index bc94a8487..000000000 Binary files a/public/2018/01/openrefine-authors.png and /dev/null differ diff --git a/public/2018/01/postgres_connections-day-perfectip.png b/public/2018/01/postgres_connections-day-perfectip.png deleted file mode 100644 index 95e35e298..000000000 Binary files a/public/2018/01/postgres_connections-day-perfectip.png and /dev/null differ diff --git a/public/2018/01/postgres_connections-day.png b/public/2018/01/postgres_connections-day.png deleted file mode 100644 index da27ce7af..000000000 Binary files a/public/2018/01/postgres_connections-day.png and /dev/null differ diff --git a/public/2018/01/tomcat-jvm-day.png b/public/2018/01/tomcat-jvm-day.png deleted file mode 100644 index 17f956fd5..000000000 Binary files a/public/2018/01/tomcat-jvm-day.png and /dev/null differ diff --git a/public/2018/01/tomcat-threads-day.png b/public/2018/01/tomcat-threads-day.png deleted file mode 100644 index c22fa890f..000000000 Binary files a/public/2018/01/tomcat-threads-day.png and /dev/null differ diff --git a/public/2018/02/CCAFS_WP_223.jpg b/public/2018/02/CCAFS_WP_223.jpg deleted file mode 100644 index ed48758f8..000000000 Binary files a/public/2018/02/CCAFS_WP_223.jpg and /dev/null differ diff --git a/public/2018/02/CCAFS_WP_223.pdf.jpg b/public/2018/02/CCAFS_WP_223.pdf.jpg deleted file mode 100644 index f49be90aa..000000000 Binary files a/public/2018/02/CCAFS_WP_223.pdf.jpg and /dev/null differ diff --git a/public/2018/02/jmx_dspace-sessions-day.png b/public/2018/02/jmx_dspace-sessions-day.png deleted file mode 100644 index e8ad64e58..000000000 Binary files a/public/2018/02/jmx_dspace-sessions-day.png and /dev/null differ diff --git a/public/2018/02/jmx_dspace_sessions-day.png b/public/2018/02/jmx_dspace_sessions-day.png deleted file mode 100644 index 5b102d117..000000000 Binary files a/public/2018/02/jmx_dspace_sessions-day.png and /dev/null differ diff --git a/public/404.html b/public/404.html deleted file mode 100644 index 595956bb3..000000000 --- a/public/404.html +++ /dev/null @@ -1,157 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - -
-
-

Page Not Found

-
-

Page not found. Go back home.

-
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/categories/index.html b/public/categories/index.html deleted file mode 100644 index 976dee896..000000000 --- a/public/categories/index.html +++ /dev/null @@ -1,169 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/categories/index.xml b/public/categories/index.xml deleted file mode 100644 index 2a273ec1b..000000000 --- a/public/categories/index.xml +++ /dev/null @@ -1,23 +0,0 @@ - - - - Categories on CGSpace Notes - https://alanorth.github.io/cgspace-notes/categories/ - Recent content in Categories on CGSpace Notes - Hugo -- gohugo.io - en-us - - - - - - Notes - https://alanorth.github.io/cgspace-notes/categories/notes/ - Mon, 18 Sep 2017 16:38:35 +0300 - - https://alanorth.github.io/cgspace-notes/categories/notes/ - - - - - \ No newline at end of file diff --git a/public/categories/notes/index.html b/public/categories/notes/index.html deleted file mode 100644 index f8fe26e16..000000000 --- a/public/categories/notes/index.html +++ /dev/null @@ -1,190 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

CGIAR Library Migration

- -
-

Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

- -

- Read more → -
- - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/categories/notes/index.xml b/public/categories/notes/index.xml deleted file mode 100644 index 88149ec10..000000000 --- a/public/categories/notes/index.xml +++ /dev/null @@ -1,26 +0,0 @@ - - - - Notes on CGSpace Notes - https://alanorth.github.io/cgspace-notes/categories/notes/ - Recent content in Notes on CGSpace Notes - Hugo -- gohugo.io - en-us - Mon, 18 Sep 2017 16:38:35 +0300 - - - - - - CGIAR Library Migration - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - Mon, 18 Sep 2017 16:38:35 +0300 - - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> - -<p></p> - - - - \ No newline at end of file diff --git a/public/categories/notes/page/1/index.html b/public/categories/notes/page/1/index.html deleted file mode 100644 index 2b4998a28..000000000 --- a/public/categories/notes/page/1/index.html +++ /dev/null @@ -1 +0,0 @@ -https://alanorth.github.io/cgspace-notes/categories/notes/ \ No newline at end of file diff --git a/public/categories/page/1/index.html b/public/categories/page/1/index.html deleted file mode 100644 index 7369f88f0..000000000 --- a/public/categories/page/1/index.html +++ /dev/null @@ -1 +0,0 @@ -https://alanorth.github.io/cgspace-notes/categories/ \ No newline at end of file diff --git a/public/cgiar-library-migration/index.html b/public/cgiar-library-migration/index.html deleted file mode 100644 index 87c5d9a55..000000000 --- a/public/cgiar-library-migration/index.html +++ /dev/null @@ -1,396 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGIAR Library Migration | CGSpace Notes - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - -
-
-

CGIAR Library Migration

- -
-

Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

- -

- -

Pre-migration Technical TODOs

- -

Things that need to happen before the migration:

- -
    -
  • -
  • Set up nginx redirects for URLs like: - -
  • -
  • -
  • -
  • -
  • -
- -
$ keytool -list -keystore tomcat.keystore
-$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
-$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
-$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
-$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
-$ cat library.cgiar.org.crt.pem gdig2.crt.pem > library.cgiar.org-chained.pem
-
- -

Migration Process

- -

Export all top-level communities and collections from DSpace Test:

- -
$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2516 10947-2516/10947-2516.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2517 10947-2517/10947-2517.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2518 10947-2518/10947-2518.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2519 10947-2519/10947-2519.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2708 10947-2708/10947-2708.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2526 10947-2526/10947-2526.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2871 10947-2871/10947-2871.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2527 10947-2527/10947-2527.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93759 10568-93759/10568-93759.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/10568-93760.zip
-$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
-
- -

Import to CGSpace (also see notes from 2017-05-10):

- -
    -
  • -
  • -
- -
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
-mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
-
- -
    -
  • -
- -
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
-$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
-$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
-$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
-$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
-$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
-$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
-$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
-
- -

This submits AIP hierarchies recursively (-r) and suppresses errors when an item’s parent collection hasn’t been created yet—for example, if the item is mapped. The large historic archive (109471) is created in several steps because it requires a lot of memory and often crashes.

- -

Create new subcommunities and collections for content we reorganized into new hierarchies from the original:

- -
    -
  • -
- -
$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
-$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
-
- -
    -
  • -
- -
$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
-
- -

Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:

- -
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
-
- -
    -
  • Export them from the CGIAR Library:
  • -
- -
# for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
-
- -
    -
  • Import on CGSpace:
  • -
- -
$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
-
- -

Post Migration

- -
    -
  • -
  • -
  • -
  • -
  • -
- -
"server_admins" = (
-"300:0.NA/10568"
-"300:0.NA/10947"
-)
-
-"replication_admins" = (
-"300:0.NA/10568"
-"300:0.NA/10947"
-)
-
-"backup_admins" = (
-"300:0.NA/10568"
-"300:0.NA/10947"
-)
-
- -

I had been regenerated the sitebndl.zip file on the CGIAR Library server and sent it to the Handle.net admins but they said that there were mismatches between the public and private keys, which I suspect is due to make-handle-config not being very flexible. After discussing our scenario with the Handle.net admins they said we actually don’t need to send an updated sitebndl.zip for this type of change, and the above config.dct edits are all that is required. I guess they just did something on their end by setting the authoritative IP address for the 10947 prefix to be the same as ours…

- -
    -
  • -
  • -
  • -
  • -
  • -
- -
$ sudo systemctl stop nginx
-$ /opt/certbot-auto certonly --standalone -d library.cgiar.org
-$ sudo systemctl start nginx
-
- -

Troubleshooting

- -

Foreign Key Error in dspace cleanup

- -

The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with dspace cleanup -v:

- -
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"                                                                                                                       
-  Detail: Key (bitstream_id)=(119841) is still referenced from table "bundle".
-
- -

The solution is to set the primary_bitstream_id to NULL in PostgreSQL:

- -
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841);
-
- -

PSQLException During AIP Ingest

- -

After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):

- -
org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"                                    
-  Detail: Key (handle_id)=(86227) already exists.
-
- -

The normal solution is to run the update-sequences.sql script (with Tomcat shut down) but it doesn’t seem to work in this case. Finding the maximum handle_id and manually updating the sequence seems to work:

- -
dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
-dspace=# select setval('handle_seq',86873);
-
- - - - - -
- - - -
- - - - -
-
- - - - - - - - - diff --git a/public/css/cookieconsent.min.css b/public/css/cookieconsent.min.css deleted file mode 100644 index 03c69fe82..000000000 --- a/public/css/cookieconsent.min.css +++ /dev/null @@ -1,6 +0,0 @@ -.cc-window{opacity:1;transition:opacity 1s ease}.cc-window.cc-invisible{opacity:0}.cc-animate.cc-revoke{transition:transform 1s ease}.cc-animate.cc-revoke.cc-top{transform:translateY(-2em)}.cc-animate.cc-revoke.cc-bottom{transform:translateY(2em)}.cc-animate.cc-revoke.cc-active.cc-bottom,.cc-animate.cc-revoke.cc-active.cc-top,.cc-revoke:hover{transform:translateY(0)}.cc-grower{max-height:0;overflow:hidden;transition:max-height 1s} -.cc-link,.cc-revoke:hover{text-decoration:underline}.cc-revoke,.cc-window{position:fixed;overflow:hidden;box-sizing:border-box;font-family:Helvetica,Calibri,Arial,sans-serif;font-size:16px;line-height:1.5em;display:-ms-flexbox;display:flex;-ms-flex-wrap:nowrap;flex-wrap:nowrap;z-index:9999}.cc-window.cc-static{position:static}.cc-window.cc-floating{padding:2em;max-width:24em;-ms-flex-direction:column;flex-direction:column}.cc-window.cc-banner{padding:1em 1.8em;width:100%;-ms-flex-direction:row;flex-direction:row}.cc-revoke{padding:.5em}.cc-header{font-size:18px;font-weight:700}.cc-btn,.cc-close,.cc-link,.cc-revoke{cursor:pointer}.cc-link{opacity:.8;display:inline-block;padding:.2em}.cc-link:hover{opacity:1}.cc-link:active,.cc-link:visited{color:initial}.cc-btn{display:block;padding:.4em .8em;font-size:.9em;font-weight:700;border-width:2px;border-style:solid;text-align:center;white-space:nowrap}.cc-banner .cc-btn:last-child{min-width:140px}.cc-highlight .cc-btn:first-child{background-color:transparent;border-color:transparent}.cc-highlight .cc-btn:first-child:focus,.cc-highlight .cc-btn:first-child:hover{background-color:transparent;text-decoration:underline}.cc-close{display:block;position:absolute;top:.5em;right:.5em;font-size:1.6em;opacity:.9;line-height:.75}.cc-close:focus,.cc-close:hover{opacity:1} -.cc-revoke.cc-top{top:0;left:3em;border-bottom-left-radius:.5em;border-bottom-right-radius:.5em}.cc-revoke.cc-bottom{bottom:0;left:3em;border-top-left-radius:.5em;border-top-right-radius:.5em}.cc-revoke.cc-left{left:3em;right:unset}.cc-revoke.cc-right{right:3em;left:unset}.cc-top{top:1em}.cc-left{left:1em}.cc-right{right:1em}.cc-bottom{bottom:1em}.cc-floating>.cc-link{margin-bottom:1em}.cc-floating .cc-message{display:block;margin-bottom:1em}.cc-window.cc-floating .cc-compliance{-ms-flex:1;flex:1}.cc-window.cc-banner{-ms-flex-align:center;align-items:center}.cc-banner.cc-top{left:0;right:0;top:0}.cc-banner.cc-bottom{left:0;right:0;bottom:0}.cc-banner .cc-message{-ms-flex:1;flex:1}.cc-compliance{display:-ms-flexbox;display:flex;-ms-flex-align:center;align-items:center;-ms-flex-line-pack:justify;align-content:space-between}.cc-compliance>.cc-btn{-ms-flex:1;flex:1}.cc-btn+.cc-btn{margin-left:.5em} -@media print{.cc-revoke,.cc-window{display:none}}@media screen and (max-width:900px){.cc-btn{white-space:normal}}@media screen and (max-width:414px) and (orientation:portrait),screen and (max-width:736px) and (orientation:landscape){.cc-window.cc-top{top:0}.cc-window.cc-bottom{bottom:0}.cc-window.cc-banner,.cc-window.cc-left,.cc-window.cc-right{left:0;right:0}.cc-window.cc-banner{-ms-flex-direction:column;flex-direction:column}.cc-window.cc-banner .cc-compliance{-ms-flex:1;flex:1}.cc-window.cc-floating{max-width:none}.cc-window .cc-message{margin-bottom:1em}.cc-window.cc-banner{-ms-flex-align:unset;align-items:unset}} -.cc-floating.cc-theme-classic{padding:1.2em;border-radius:5px}.cc-floating.cc-type-info.cc-theme-classic .cc-compliance{text-align:center;display:inline;-ms-flex:none;flex:none}.cc-theme-classic .cc-btn{border-radius:5px}.cc-theme-classic .cc-btn:last-child{min-width:140px}.cc-floating.cc-type-info.cc-theme-classic .cc-btn{display:inline-block} -.cc-theme-edgeless.cc-window{padding:0}.cc-floating.cc-theme-edgeless .cc-message{margin:2em 2em 1.5em}.cc-banner.cc-theme-edgeless .cc-btn{margin:0;padding:.8em 1.8em;height:100%}.cc-banner.cc-theme-edgeless .cc-message{margin-left:1em}.cc-floating.cc-theme-edgeless .cc-btn+.cc-btn{margin-left:0} \ No newline at end of file diff --git a/public/css/style.css b/public/css/style.css deleted file mode 100644 index f55df552f..000000000 --- a/public/css/style.css +++ /dev/null @@ -1,9 +0,0 @@ -@charset "UTF-8";/*! - * Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome - * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) - */@font-face{font-family:FontAwesome;src:url(../fonts/fontawesome-webfont.eot?v=4.7.0);src:url(../fonts/fontawesome-webfont.eot?#iefix&v=4.7.0) format("embedded-opentype"),url(../fonts/fontawesome-webfont.woff2?v=4.7.0) format("woff2"),url(../fonts/fontawesome-webfont.woff?v=4.7.0) format("woff"),url(../fonts/fontawesome-webfont.ttf?v=4.7.0) format("truetype"),url(../fonts/fontawesome-webfont.svg?v=4.7.0#fontawesomeregular) format("svg");font-weight:400;font-style:normal}.fa{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.285714em;text-align:center}.fa-tag:before{content:""}.fa-folder:before{content:""}.fa-facebook:before{content:""}.fa-google-plus:before{content:""}.fa-linkedin:before{content:""}.fa-rss:before{content:""}.fa-rss-square:before{content:""}.fa-twitter:before{content:""}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0,0,0,0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto}/*! - * Bootstrap v4.0.0 (https://getbootstrap.com) - * Copyright 2011-2018 The Bootstrap Authors - * Copyright 2011-2018 Twitter, Inc. - * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE) - */:root{--blue:#007bff;--indigo:#6610f2;--purple:#6f42c1;--pink:#e83e8c;--red:#dc3545;--orange:#fd7e14;--yellow:#ffc107;--green:#28a745;--teal:#20c997;--cyan:#17a2b8;--white:#fff;--gray:#6c757d;--gray-dark:#343a40;--primary:#007bff;--secondary:#6c757d;--success:#28a745;--info:#17a2b8;--warning:#ffc107;--danger:#dc3545;--light:#f8f9fa;--dark:#343a40;--breakpoint-xs:0;--breakpoint-sm:576px;--breakpoint-md:768px;--breakpoint-lg:992px;--breakpoint-xl:1200px;--font-family-sans-serif:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";--font-family-monospace:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace}*,::after,::before{box-sizing:border-box}html{font-family:sans-serif;line-height:1.15;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;-ms-overflow-style:scrollbar;-webkit-tap-highlight-color:transparent}@-ms-viewport{width:device-width}article,aside,dialog,figcaption,figure,footer,header,hgroup,main,nav,section{display:block}body{margin:0;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:1rem;font-weight:400;line-height:1.5;color:#212529;text-align:left;background-color:#fff}[tabindex="-1"]:focus{outline:0!important}hr{box-sizing:content-box;height:0;overflow:visible}h1,h2,h3,h4,h5,h6{margin-top:0;margin-bottom:.5rem}p{margin-top:0;margin-bottom:1rem}abbr[data-original-title],abbr[title]{text-decoration:underline;text-decoration:underline dotted;cursor:help;border-bottom:0}address{margin-bottom:1rem;font-style:normal;line-height:inherit}dl,ol,ul{margin-top:0;margin-bottom:1rem}ol ol,ol ul,ul ol,ul ul{margin-bottom:0}dt{font-weight:700}dd{margin-bottom:.5rem;margin-left:0}blockquote{margin:0 0 1rem}dfn{font-style:italic}b,strong{font-weight:bolder}small{font-size:80%}sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}a{color:#007bff;text-decoration:none;background-color:transparent;-webkit-text-decoration-skip:objects}a:hover{color:#0056b3;text-decoration:underline}a:not([href]):not([tabindex]){color:inherit;text-decoration:none}a:not([href]):not([tabindex]):focus,a:not([href]):not([tabindex]):hover{color:inherit;text-decoration:none}a:not([href]):not([tabindex]):focus{outline:0}code,kbd,pre,samp{font-family:monospace,monospace;font-size:1em}pre{margin-top:0;margin-bottom:1rem;overflow:auto;-ms-overflow-style:scrollbar}figure{margin:0 0 1rem}img{vertical-align:middle;border-style:none}svg:not(:root){overflow:hidden}table{border-collapse:collapse}caption{padding-top:.75rem;padding-bottom:.75rem;color:#6c757d;text-align:left;caption-side:bottom}th{text-align:inherit}label{display:inline-block;margin-bottom:.5rem}button{border-radius:0}button:focus{outline:1px dotted;outline:5px auto -webkit-focus-ring-color}button,input,optgroup,select,textarea{margin:0;font-family:inherit;font-size:inherit;line-height:inherit}button,input{overflow:visible}button,select{text-transform:none}[type=reset],[type=submit],button,html [type=button]{-webkit-appearance:button}[type=button]::-moz-focus-inner,[type=reset]::-moz-focus-inner,[type=submit]::-moz-focus-inner,button::-moz-focus-inner{padding:0;border-style:none}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=date],input[type=datetime-local],input[type=month],input[type=time]{-webkit-appearance:listbox}textarea{overflow:auto;resize:vertical}fieldset{min-width:0;padding:0;margin:0;border:0}legend{display:block;width:100%;max-width:100%;padding:0;margin-bottom:.5rem;font-size:1.5rem;line-height:inherit;color:inherit;white-space:normal}progress{vertical-align:baseline}[type=number]::-webkit-inner-spin-button,[type=number]::-webkit-outer-spin-button{height:auto}[type=search]{outline-offset:-2px;-webkit-appearance:none}[type=search]::-webkit-search-cancel-button,[type=search]::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{font:inherit;-webkit-appearance:button}output{display:inline-block}summary{display:list-item;cursor:pointer}template{display:none}[hidden]{display:none!important}.h1,.h2,.h3,.h4,.h5,.h6,h1,h2,h3,h4,h5,h6{margin-bottom:.5rem;font-family:inherit;font-weight:500;line-height:1.2;color:inherit}.h1,h1{font-size:2.5rem}.h2,h2{font-size:2rem}.h3,h3{font-size:1.75rem}.h4,h4{font-size:1.5rem}.h5,h5{font-size:1.25rem}.h6,h6{font-size:1rem}.lead{font-size:1.25rem;font-weight:300}.display-1{font-size:6rem;font-weight:300;line-height:1.2}.display-2{font-size:5.5rem;font-weight:300;line-height:1.2}.display-3{font-size:4.5rem;font-weight:300;line-height:1.2}.display-4{font-size:3.5rem;font-weight:300;line-height:1.2}hr{margin-top:1rem;margin-bottom:1rem;border:0;border-top:1px solid rgba(0,0,0,.1)}.small,small{font-size:80%;font-weight:400}.mark,mark{padding:.2em;background-color:#fcf8e3}.list-unstyled{padding-left:0;list-style:none}.list-inline{padding-left:0;list-style:none}.list-inline-item{display:inline-block}.list-inline-item:not(:last-child){margin-right:.5rem}.initialism{font-size:90%;text-transform:uppercase}.blockquote{margin-bottom:1rem;font-size:1.25rem}.blockquote-footer{display:block;font-size:80%;color:#6c757d}.blockquote-footer::before{content:"\2014 \00A0"}.img-fluid{max-width:100%;height:auto}.img-thumbnail{padding:.25rem;background-color:#fff;border:1px solid #dee2e6;border-radius:.25rem;max-width:100%;height:auto}.figure{display:inline-block}.figure-img{margin-bottom:.5rem;line-height:1}.figure-caption{font-size:90%;color:#6c757d}code,kbd,pre,samp{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace}code{font-size:87.5%;color:#e83e8c;word-break:break-word}a>code{color:inherit}kbd{padding:.2rem .4rem;font-size:87.5%;color:#fff;background-color:#212529;border-radius:.2rem}kbd kbd{padding:0;font-size:100%;font-weight:700}pre{display:block;font-size:87.5%;color:#212529}pre code{font-size:inherit;color:inherit;word-break:normal}.pre-scrollable{max-height:340px;overflow-y:scroll}.container{width:100%;padding-right:15px;padding-left:15px;margin-right:auto;margin-left:auto}@media (min-width:576px){.container{max-width:540px}}@media (min-width:768px){.container{max-width:720px}}@media (min-width:992px){.container{max-width:960px}}@media (min-width:1200px){.container{max-width:1140px}}.container-fluid{width:100%;padding-right:15px;padding-left:15px;margin-right:auto;margin-left:auto}.row{display:flex;flex-wrap:wrap;margin-right:-15px;margin-left:-15px}.no-gutters{margin-right:0;margin-left:0}.no-gutters>.col,.no-gutters>[class*=col-]{padding-right:0;padding-left:0}.col,.col-1,.col-10,.col-11,.col-12,.col-2,.col-3,.col-4,.col-5,.col-6,.col-7,.col-8,.col-9,.col-auto,.col-lg,.col-lg-1,.col-lg-10,.col-lg-11,.col-lg-12,.col-lg-2,.col-lg-3,.col-lg-4,.col-lg-5,.col-lg-6,.col-lg-7,.col-lg-8,.col-lg-9,.col-lg-auto,.col-md,.col-md-1,.col-md-10,.col-md-11,.col-md-12,.col-md-2,.col-md-3,.col-md-4,.col-md-5,.col-md-6,.col-md-7,.col-md-8,.col-md-9,.col-md-auto,.col-sm,.col-sm-1,.col-sm-10,.col-sm-11,.col-sm-12,.col-sm-2,.col-sm-3,.col-sm-4,.col-sm-5,.col-sm-6,.col-sm-7,.col-sm-8,.col-sm-9,.col-sm-auto,.col-xl,.col-xl-1,.col-xl-10,.col-xl-11,.col-xl-12,.col-xl-2,.col-xl-3,.col-xl-4,.col-xl-5,.col-xl-6,.col-xl-7,.col-xl-8,.col-xl-9,.col-xl-auto{position:relative;width:100%;min-height:1px;padding-right:15px;padding-left:15px}.col{flex-basis:0;flex-grow:1;max-width:100%}.col-auto{flex:0 0 auto;width:auto;max-width:none}.col-1{flex:0 0 8.333333%;max-width:8.333333%}.col-2{flex:0 0 16.666667%;max-width:16.666667%}.col-3{flex:0 0 25%;max-width:25%}.col-4{flex:0 0 33.333333%;max-width:33.333333%}.col-5{flex:0 0 41.666667%;max-width:41.666667%}.col-6{flex:0 0 50%;max-width:50%}.col-7{flex:0 0 58.333333%;max-width:58.333333%}.col-8{flex:0 0 66.666667%;max-width:66.666667%}.col-9{flex:0 0 75%;max-width:75%}.col-10{flex:0 0 83.333333%;max-width:83.333333%}.col-11{flex:0 0 91.666667%;max-width:91.666667%}.col-12{flex:0 0 100%;max-width:100%}.order-first{order:-1}.order-last{order:13}.order-0{order:0}.order-1{order:1}.order-2{order:2}.order-3{order:3}.order-4{order:4}.order-5{order:5}.order-6{order:6}.order-7{order:7}.order-8{order:8}.order-9{order:9}.order-10{order:10}.order-11{order:11}.order-12{order:12}.offset-1{margin-left:8.333333%}.offset-2{margin-left:16.666667%}.offset-3{margin-left:25%}.offset-4{margin-left:33.333333%}.offset-5{margin-left:41.666667%}.offset-6{margin-left:50%}.offset-7{margin-left:58.333333%}.offset-8{margin-left:66.666667%}.offset-9{margin-left:75%}.offset-10{margin-left:83.333333%}.offset-11{margin-left:91.666667%}@media (min-width:576px){.col-sm{flex-basis:0;flex-grow:1;max-width:100%}.col-sm-auto{flex:0 0 auto;width:auto;max-width:none}.col-sm-1{flex:0 0 8.333333%;max-width:8.333333%}.col-sm-2{flex:0 0 16.666667%;max-width:16.666667%}.col-sm-3{flex:0 0 25%;max-width:25%}.col-sm-4{flex:0 0 33.333333%;max-width:33.333333%}.col-sm-5{flex:0 0 41.666667%;max-width:41.666667%}.col-sm-6{flex:0 0 50%;max-width:50%}.col-sm-7{flex:0 0 58.333333%;max-width:58.333333%}.col-sm-8{flex:0 0 66.666667%;max-width:66.666667%}.col-sm-9{flex:0 0 75%;max-width:75%}.col-sm-10{flex:0 0 83.333333%;max-width:83.333333%}.col-sm-11{flex:0 0 91.666667%;max-width:91.666667%}.col-sm-12{flex:0 0 100%;max-width:100%}.order-sm-first{order:-1}.order-sm-last{order:13}.order-sm-0{order:0}.order-sm-1{order:1}.order-sm-2{order:2}.order-sm-3{order:3}.order-sm-4{order:4}.order-sm-5{order:5}.order-sm-6{order:6}.order-sm-7{order:7}.order-sm-8{order:8}.order-sm-9{order:9}.order-sm-10{order:10}.order-sm-11{order:11}.order-sm-12{order:12}.offset-sm-0{margin-left:0}.offset-sm-1{margin-left:8.333333%}.offset-sm-2{margin-left:16.666667%}.offset-sm-3{margin-left:25%}.offset-sm-4{margin-left:33.333333%}.offset-sm-5{margin-left:41.666667%}.offset-sm-6{margin-left:50%}.offset-sm-7{margin-left:58.333333%}.offset-sm-8{margin-left:66.666667%}.offset-sm-9{margin-left:75%}.offset-sm-10{margin-left:83.333333%}.offset-sm-11{margin-left:91.666667%}}@media (min-width:768px){.col-md{flex-basis:0;flex-grow:1;max-width:100%}.col-md-auto{flex:0 0 auto;width:auto;max-width:none}.col-md-1{flex:0 0 8.333333%;max-width:8.333333%}.col-md-2{flex:0 0 16.666667%;max-width:16.666667%}.col-md-3{flex:0 0 25%;max-width:25%}.col-md-4{flex:0 0 33.333333%;max-width:33.333333%}.col-md-5{flex:0 0 41.666667%;max-width:41.666667%}.col-md-6{flex:0 0 50%;max-width:50%}.col-md-7{flex:0 0 58.333333%;max-width:58.333333%}.col-md-8{flex:0 0 66.666667%;max-width:66.666667%}.col-md-9{flex:0 0 75%;max-width:75%}.col-md-10{flex:0 0 83.333333%;max-width:83.333333%}.col-md-11{flex:0 0 91.666667%;max-width:91.666667%}.col-md-12{flex:0 0 100%;max-width:100%}.order-md-first{order:-1}.order-md-last{order:13}.order-md-0{order:0}.order-md-1{order:1}.order-md-2{order:2}.order-md-3{order:3}.order-md-4{order:4}.order-md-5{order:5}.order-md-6{order:6}.order-md-7{order:7}.order-md-8{order:8}.order-md-9{order:9}.order-md-10{order:10}.order-md-11{order:11}.order-md-12{order:12}.offset-md-0{margin-left:0}.offset-md-1{margin-left:8.333333%}.offset-md-2{margin-left:16.666667%}.offset-md-3{margin-left:25%}.offset-md-4{margin-left:33.333333%}.offset-md-5{margin-left:41.666667%}.offset-md-6{margin-left:50%}.offset-md-7{margin-left:58.333333%}.offset-md-8{margin-left:66.666667%}.offset-md-9{margin-left:75%}.offset-md-10{margin-left:83.333333%}.offset-md-11{margin-left:91.666667%}}@media (min-width:992px){.col-lg{flex-basis:0;flex-grow:1;max-width:100%}.col-lg-auto{flex:0 0 auto;width:auto;max-width:none}.col-lg-1{flex:0 0 8.333333%;max-width:8.333333%}.col-lg-2{flex:0 0 16.666667%;max-width:16.666667%}.col-lg-3{flex:0 0 25%;max-width:25%}.col-lg-4{flex:0 0 33.333333%;max-width:33.333333%}.col-lg-5{flex:0 0 41.666667%;max-width:41.666667%}.col-lg-6{flex:0 0 50%;max-width:50%}.col-lg-7{flex:0 0 58.333333%;max-width:58.333333%}.col-lg-8{flex:0 0 66.666667%;max-width:66.666667%}.col-lg-9{flex:0 0 75%;max-width:75%}.col-lg-10{flex:0 0 83.333333%;max-width:83.333333%}.col-lg-11{flex:0 0 91.666667%;max-width:91.666667%}.col-lg-12{flex:0 0 100%;max-width:100%}.order-lg-first{order:-1}.order-lg-last{order:13}.order-lg-0{order:0}.order-lg-1{order:1}.order-lg-2{order:2}.order-lg-3{order:3}.order-lg-4{order:4}.order-lg-5{order:5}.order-lg-6{order:6}.order-lg-7{order:7}.order-lg-8{order:8}.order-lg-9{order:9}.order-lg-10{order:10}.order-lg-11{order:11}.order-lg-12{order:12}.offset-lg-0{margin-left:0}.offset-lg-1{margin-left:8.333333%}.offset-lg-2{margin-left:16.666667%}.offset-lg-3{margin-left:25%}.offset-lg-4{margin-left:33.333333%}.offset-lg-5{margin-left:41.666667%}.offset-lg-6{margin-left:50%}.offset-lg-7{margin-left:58.333333%}.offset-lg-8{margin-left:66.666667%}.offset-lg-9{margin-left:75%}.offset-lg-10{margin-left:83.333333%}.offset-lg-11{margin-left:91.666667%}}@media (min-width:1200px){.col-xl{flex-basis:0;flex-grow:1;max-width:100%}.col-xl-auto{flex:0 0 auto;width:auto;max-width:none}.col-xl-1{flex:0 0 8.333333%;max-width:8.333333%}.col-xl-2{flex:0 0 16.666667%;max-width:16.666667%}.col-xl-3{flex:0 0 25%;max-width:25%}.col-xl-4{flex:0 0 33.333333%;max-width:33.333333%}.col-xl-5{flex:0 0 41.666667%;max-width:41.666667%}.col-xl-6{flex:0 0 50%;max-width:50%}.col-xl-7{flex:0 0 58.333333%;max-width:58.333333%}.col-xl-8{flex:0 0 66.666667%;max-width:66.666667%}.col-xl-9{flex:0 0 75%;max-width:75%}.col-xl-10{flex:0 0 83.333333%;max-width:83.333333%}.col-xl-11{flex:0 0 91.666667%;max-width:91.666667%}.col-xl-12{flex:0 0 100%;max-width:100%}.order-xl-first{order:-1}.order-xl-last{order:13}.order-xl-0{order:0}.order-xl-1{order:1}.order-xl-2{order:2}.order-xl-3{order:3}.order-xl-4{order:4}.order-xl-5{order:5}.order-xl-6{order:6}.order-xl-7{order:7}.order-xl-8{order:8}.order-xl-9{order:9}.order-xl-10{order:10}.order-xl-11{order:11}.order-xl-12{order:12}.offset-xl-0{margin-left:0}.offset-xl-1{margin-left:8.333333%}.offset-xl-2{margin-left:16.666667%}.offset-xl-3{margin-left:25%}.offset-xl-4{margin-left:33.333333%}.offset-xl-5{margin-left:41.666667%}.offset-xl-6{margin-left:50%}.offset-xl-7{margin-left:58.333333%}.offset-xl-8{margin-left:66.666667%}.offset-xl-9{margin-left:75%}.offset-xl-10{margin-left:83.333333%}.offset-xl-11{margin-left:91.666667%}}.form-control{display:block;width:100%;padding:.375rem .75rem;font-size:1rem;line-height:1.5;color:#495057;background-color:#fff;background-clip:padding-box;border:1px solid #ced4da;border-radius:.25rem;transition:border-color .15s ease-in-out,box-shadow .15s ease-in-out}.form-control::-ms-expand{background-color:transparent;border:0}.form-control:focus{color:#495057;background-color:#fff;border-color:#80bdff;outline:0;box-shadow:0 0 0 .2rem rgba(0,123,255,.25)}.form-control::placeholder{color:#6c757d;opacity:1}.form-control:disabled,.form-control[readonly]{background-color:#e9ecef;opacity:1}select.form-control:not([size]):not([multiple]){height:calc(2.25rem + 2px)}select.form-control:focus::-ms-value{color:#495057;background-color:#fff}.form-control-file,.form-control-range{display:block;width:100%}.col-form-label{padding-top:calc(.375rem + 1px);padding-bottom:calc(.375rem + 1px);margin-bottom:0;font-size:inherit;line-height:1.5}.col-form-label-lg{padding-top:calc(.5rem + 1px);padding-bottom:calc(.5rem + 1px);font-size:1.25rem;line-height:1.5}.col-form-label-sm{padding-top:calc(.25rem + 1px);padding-bottom:calc(.25rem + 1px);font-size:.875rem;line-height:1.5}.form-control-plaintext{display:block;width:100%;padding-top:.375rem;padding-bottom:.375rem;margin-bottom:0;line-height:1.5;background-color:transparent;border:solid transparent;border-width:1px 0}.form-control-plaintext.form-control-lg,.form-control-plaintext.form-control-sm{padding-right:0;padding-left:0}.form-control-sm{padding:.25rem .5rem;font-size:.875rem;line-height:1.5;border-radius:.2rem}select.form-control-sm:not([size]):not([multiple]){height:calc(1.8125rem + 2px)}.form-control-lg{padding:.5rem 1rem;font-size:1.25rem;line-height:1.5;border-radius:.3rem}select.form-control-lg:not([size]):not([multiple]){height:calc(2.875rem + 2px)}.form-group{margin-bottom:1rem}.form-text{display:block;margin-top:.25rem}.form-row{display:flex;flex-wrap:wrap;margin-right:-5px;margin-left:-5px}.form-row>.col,.form-row>[class*=col-]{padding-right:5px;padding-left:5px}.form-check{position:relative;display:block;padding-left:1.25rem}.form-check-input{position:absolute;margin-top:.3rem;margin-left:-1.25rem}.form-check-input:disabled~.form-check-label{color:#6c757d}.form-check-label{margin-bottom:0}.form-check-inline{display:inline-flex;align-items:center;padding-left:0;margin-right:.75rem}.form-check-inline .form-check-input{position:static;margin-top:0;margin-right:.3125rem;margin-left:0}.valid-feedback{display:none;width:100%;margin-top:.25rem;font-size:80%;color:#28a745}.valid-tooltip{position:absolute;top:100%;z-index:5;display:none;max-width:100%;padding:.5rem;margin-top:.1rem;font-size:.875rem;line-height:1;color:#fff;background-color:rgba(40,167,69,.8);border-radius:.2rem}.custom-select.is-valid,.form-control.is-valid,.was-validated .custom-select:valid,.was-validated .form-control:valid{border-color:#28a745}.custom-select.is-valid:focus,.form-control.is-valid:focus,.was-validated .custom-select:valid:focus,.was-validated .form-control:valid:focus{border-color:#28a745;box-shadow:0 0 0 .2rem rgba(40,167,69,.25)}.custom-select.is-valid~.valid-feedback,.custom-select.is-valid~.valid-tooltip,.form-control.is-valid~.valid-feedback,.form-control.is-valid~.valid-tooltip,.was-validated .custom-select:valid~.valid-feedback,.was-validated .custom-select:valid~.valid-tooltip,.was-validated .form-control:valid~.valid-feedback,.was-validated .form-control:valid~.valid-tooltip{display:block}.form-check-input.is-valid~.form-check-label,.was-validated .form-check-input:valid~.form-check-label{color:#28a745}.form-check-input.is-valid~.valid-feedback,.form-check-input.is-valid~.valid-tooltip,.was-validated .form-check-input:valid~.valid-feedback,.was-validated .form-check-input:valid~.valid-tooltip{display:block}.custom-control-input.is-valid~.custom-control-label,.was-validated .custom-control-input:valid~.custom-control-label{color:#28a745}.custom-control-input.is-valid~.custom-control-label::before,.was-validated .custom-control-input:valid~.custom-control-label::before{background-color:#71dd8a}.custom-control-input.is-valid~.valid-feedback,.custom-control-input.is-valid~.valid-tooltip,.was-validated .custom-control-input:valid~.valid-feedback,.was-validated .custom-control-input:valid~.valid-tooltip{display:block}.custom-control-input.is-valid:checked~.custom-control-label::before,.was-validated .custom-control-input:valid:checked~.custom-control-label::before{background-color:#34ce57}.custom-control-input.is-valid:focus~.custom-control-label::before,.was-validated .custom-control-input:valid:focus~.custom-control-label::before{box-shadow:0 0 0 1px #fff,0 0 0 .2rem rgba(40,167,69,.25)}.custom-file-input.is-valid~.custom-file-label,.was-validated .custom-file-input:valid~.custom-file-label{border-color:#28a745}.custom-file-input.is-valid~.custom-file-label::before,.was-validated .custom-file-input:valid~.custom-file-label::before{border-color:inherit}.custom-file-input.is-valid~.valid-feedback,.custom-file-input.is-valid~.valid-tooltip,.was-validated .custom-file-input:valid~.valid-feedback,.was-validated .custom-file-input:valid~.valid-tooltip{display:block}.custom-file-input.is-valid:focus~.custom-file-label,.was-validated .custom-file-input:valid:focus~.custom-file-label{box-shadow:0 0 0 .2rem rgba(40,167,69,.25)}.invalid-feedback{display:none;width:100%;margin-top:.25rem;font-size:80%;color:#dc3545}.invalid-tooltip{position:absolute;top:100%;z-index:5;display:none;max-width:100%;padding:.5rem;margin-top:.1rem;font-size:.875rem;line-height:1;color:#fff;background-color:rgba(220,53,69,.8);border-radius:.2rem}.custom-select.is-invalid,.form-control.is-invalid,.was-validated .custom-select:invalid,.was-validated .form-control:invalid{border-color:#dc3545}.custom-select.is-invalid:focus,.form-control.is-invalid:focus,.was-validated .custom-select:invalid:focus,.was-validated .form-control:invalid:focus{border-color:#dc3545;box-shadow:0 0 0 .2rem rgba(220,53,69,.25)}.custom-select.is-invalid~.invalid-feedback,.custom-select.is-invalid~.invalid-tooltip,.form-control.is-invalid~.invalid-feedback,.form-control.is-invalid~.invalid-tooltip,.was-validated .custom-select:invalid~.invalid-feedback,.was-validated .custom-select:invalid~.invalid-tooltip,.was-validated .form-control:invalid~.invalid-feedback,.was-validated .form-control:invalid~.invalid-tooltip{display:block}.form-check-input.is-invalid~.form-check-label,.was-validated .form-check-input:invalid~.form-check-label{color:#dc3545}.form-check-input.is-invalid~.invalid-feedback,.form-check-input.is-invalid~.invalid-tooltip,.was-validated .form-check-input:invalid~.invalid-feedback,.was-validated .form-check-input:invalid~.invalid-tooltip{display:block}.custom-control-input.is-invalid~.custom-control-label,.was-validated .custom-control-input:invalid~.custom-control-label{color:#dc3545}.custom-control-input.is-invalid~.custom-control-label::before,.was-validated .custom-control-input:invalid~.custom-control-label::before{background-color:#efa2a9}.custom-control-input.is-invalid~.invalid-feedback,.custom-control-input.is-invalid~.invalid-tooltip,.was-validated .custom-control-input:invalid~.invalid-feedback,.was-validated .custom-control-input:invalid~.invalid-tooltip{display:block}.custom-control-input.is-invalid:checked~.custom-control-label::before,.was-validated .custom-control-input:invalid:checked~.custom-control-label::before{background-color:#e4606d}.custom-control-input.is-invalid:focus~.custom-control-label::before,.was-validated .custom-control-input:invalid:focus~.custom-control-label::before{box-shadow:0 0 0 1px #fff,0 0 0 .2rem rgba(220,53,69,.25)}.custom-file-input.is-invalid~.custom-file-label,.was-validated .custom-file-input:invalid~.custom-file-label{border-color:#dc3545}.custom-file-input.is-invalid~.custom-file-label::before,.was-validated .custom-file-input:invalid~.custom-file-label::before{border-color:inherit}.custom-file-input.is-invalid~.invalid-feedback,.custom-file-input.is-invalid~.invalid-tooltip,.was-validated .custom-file-input:invalid~.invalid-feedback,.was-validated .custom-file-input:invalid~.invalid-tooltip{display:block}.custom-file-input.is-invalid:focus~.custom-file-label,.was-validated .custom-file-input:invalid:focus~.custom-file-label{box-shadow:0 0 0 .2rem rgba(220,53,69,.25)}.form-inline{display:flex;flex-flow:row wrap;align-items:center}.form-inline .form-check{width:100%}@media (min-width:576px){.form-inline label{display:flex;align-items:center;justify-content:center;margin-bottom:0}.form-inline .form-group{display:flex;flex:0 0 auto;flex-flow:row wrap;align-items:center;margin-bottom:0}.form-inline .form-control{display:inline-block;width:auto;vertical-align:middle}.form-inline .form-control-plaintext{display:inline-block}.form-inline .input-group{width:auto}.form-inline .form-check{display:flex;align-items:center;justify-content:center;width:auto;padding-left:0}.form-inline .form-check-input{position:relative;margin-top:0;margin-right:.25rem;margin-left:0}.form-inline .custom-control{align-items:center;justify-content:center}.form-inline .custom-control-label{margin-bottom:0}}.btn{display:inline-block;font-weight:400;text-align:center;white-space:nowrap;vertical-align:middle;user-select:none;border:1px solid transparent;padding:.375rem .75rem;font-size:1rem;line-height:1.5;border-radius:.25rem;transition:color .15s ease-in-out,background-color .15s ease-in-out,border-color .15s ease-in-out,box-shadow .15s ease-in-out}.btn:focus,.btn:hover{text-decoration:none}.btn.focus,.btn:focus{outline:0;box-shadow:0 0 0 .2rem rgba(0,123,255,.25)}.btn.disabled,.btn:disabled{opacity:.65}.btn:not(:disabled):not(.disabled){cursor:pointer}.btn:not(:disabled):not(.disabled).active,.btn:not(:disabled):not(.disabled):active{background-image:none}a.btn.disabled,fieldset:disabled a.btn{pointer-events:none}.btn-primary{color:#fff;background-color:#007bff;border-color:#007bff}.btn-primary:hover{color:#fff;background-color:#0069d9;border-color:#0062cc}.btn-primary.focus,.btn-primary:focus{box-shadow:0 0 0 .2rem rgba(0,123,255,.5)}.btn-primary.disabled,.btn-primary:disabled{color:#fff;background-color:#007bff;border-color:#007bff}.btn-primary:not(:disabled):not(.disabled).active,.btn-primary:not(:disabled):not(.disabled):active,.show>.btn-primary.dropdown-toggle{color:#fff;background-color:#0062cc;border-color:#005cbf}.btn-primary:not(:disabled):not(.disabled).active:focus,.btn-primary:not(:disabled):not(.disabled):active:focus,.show>.btn-primary.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(0,123,255,.5)}.btn-secondary{color:#fff;background-color:#6c757d;border-color:#6c757d}.btn-secondary:hover{color:#fff;background-color:#5a6268;border-color:#545b62}.btn-secondary.focus,.btn-secondary:focus{box-shadow:0 0 0 .2rem rgba(108,117,125,.5)}.btn-secondary.disabled,.btn-secondary:disabled{color:#fff;background-color:#6c757d;border-color:#6c757d}.btn-secondary:not(:disabled):not(.disabled).active,.btn-secondary:not(:disabled):not(.disabled):active,.show>.btn-secondary.dropdown-toggle{color:#fff;background-color:#545b62;border-color:#4e555b}.btn-secondary:not(:disabled):not(.disabled).active:focus,.btn-secondary:not(:disabled):not(.disabled):active:focus,.show>.btn-secondary.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(108,117,125,.5)}.btn-success{color:#fff;background-color:#28a745;border-color:#28a745}.btn-success:hover{color:#fff;background-color:#218838;border-color:#1e7e34}.btn-success.focus,.btn-success:focus{box-shadow:0 0 0 .2rem rgba(40,167,69,.5)}.btn-success.disabled,.btn-success:disabled{color:#fff;background-color:#28a745;border-color:#28a745}.btn-success:not(:disabled):not(.disabled).active,.btn-success:not(:disabled):not(.disabled):active,.show>.btn-success.dropdown-toggle{color:#fff;background-color:#1e7e34;border-color:#1c7430}.btn-success:not(:disabled):not(.disabled).active:focus,.btn-success:not(:disabled):not(.disabled):active:focus,.show>.btn-success.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(40,167,69,.5)}.btn-info{color:#fff;background-color:#17a2b8;border-color:#17a2b8}.btn-info:hover{color:#fff;background-color:#138496;border-color:#117a8b}.btn-info.focus,.btn-info:focus{box-shadow:0 0 0 .2rem rgba(23,162,184,.5)}.btn-info.disabled,.btn-info:disabled{color:#fff;background-color:#17a2b8;border-color:#17a2b8}.btn-info:not(:disabled):not(.disabled).active,.btn-info:not(:disabled):not(.disabled):active,.show>.btn-info.dropdown-toggle{color:#fff;background-color:#117a8b;border-color:#10707f}.btn-info:not(:disabled):not(.disabled).active:focus,.btn-info:not(:disabled):not(.disabled):active:focus,.show>.btn-info.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(23,162,184,.5)}.btn-warning{color:#212529;background-color:#ffc107;border-color:#ffc107}.btn-warning:hover{color:#212529;background-color:#e0a800;border-color:#d39e00}.btn-warning.focus,.btn-warning:focus{box-shadow:0 0 0 .2rem rgba(255,193,7,.5)}.btn-warning.disabled,.btn-warning:disabled{color:#212529;background-color:#ffc107;border-color:#ffc107}.btn-warning:not(:disabled):not(.disabled).active,.btn-warning:not(:disabled):not(.disabled):active,.show>.btn-warning.dropdown-toggle{color:#212529;background-color:#d39e00;border-color:#c69500}.btn-warning:not(:disabled):not(.disabled).active:focus,.btn-warning:not(:disabled):not(.disabled):active:focus,.show>.btn-warning.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(255,193,7,.5)}.btn-danger{color:#fff;background-color:#dc3545;border-color:#dc3545}.btn-danger:hover{color:#fff;background-color:#c82333;border-color:#bd2130}.btn-danger.focus,.btn-danger:focus{box-shadow:0 0 0 .2rem rgba(220,53,69,.5)}.btn-danger.disabled,.btn-danger:disabled{color:#fff;background-color:#dc3545;border-color:#dc3545}.btn-danger:not(:disabled):not(.disabled).active,.btn-danger:not(:disabled):not(.disabled):active,.show>.btn-danger.dropdown-toggle{color:#fff;background-color:#bd2130;border-color:#b21f2d}.btn-danger:not(:disabled):not(.disabled).active:focus,.btn-danger:not(:disabled):not(.disabled):active:focus,.show>.btn-danger.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(220,53,69,.5)}.btn-light{color:#212529;background-color:#f8f9fa;border-color:#f8f9fa}.btn-light:hover{color:#212529;background-color:#e2e6ea;border-color:#dae0e5}.btn-light.focus,.btn-light:focus{box-shadow:0 0 0 .2rem rgba(248,249,250,.5)}.btn-light.disabled,.btn-light:disabled{color:#212529;background-color:#f8f9fa;border-color:#f8f9fa}.btn-light:not(:disabled):not(.disabled).active,.btn-light:not(:disabled):not(.disabled):active,.show>.btn-light.dropdown-toggle{color:#212529;background-color:#dae0e5;border-color:#d3d9df}.btn-light:not(:disabled):not(.disabled).active:focus,.btn-light:not(:disabled):not(.disabled):active:focus,.show>.btn-light.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(248,249,250,.5)}.btn-dark{color:#fff;background-color:#343a40;border-color:#343a40}.btn-dark:hover{color:#fff;background-color:#23272b;border-color:#1d2124}.btn-dark.focus,.btn-dark:focus{box-shadow:0 0 0 .2rem rgba(52,58,64,.5)}.btn-dark.disabled,.btn-dark:disabled{color:#fff;background-color:#343a40;border-color:#343a40}.btn-dark:not(:disabled):not(.disabled).active,.btn-dark:not(:disabled):not(.disabled):active,.show>.btn-dark.dropdown-toggle{color:#fff;background-color:#1d2124;border-color:#171a1d}.btn-dark:not(:disabled):not(.disabled).active:focus,.btn-dark:not(:disabled):not(.disabled):active:focus,.show>.btn-dark.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(52,58,64,.5)}.btn-outline-primary{color:#007bff;background-color:transparent;background-image:none;border-color:#007bff}.btn-outline-primary:hover{color:#fff;background-color:#007bff;border-color:#007bff}.btn-outline-primary.focus,.btn-outline-primary:focus{box-shadow:0 0 0 .2rem rgba(0,123,255,.5)}.btn-outline-primary.disabled,.btn-outline-primary:disabled{color:#007bff;background-color:transparent}.btn-outline-primary:not(:disabled):not(.disabled).active,.btn-outline-primary:not(:disabled):not(.disabled):active,.show>.btn-outline-primary.dropdown-toggle{color:#fff;background-color:#007bff;border-color:#007bff}.btn-outline-primary:not(:disabled):not(.disabled).active:focus,.btn-outline-primary:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-primary.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(0,123,255,.5)}.btn-outline-secondary{color:#6c757d;background-color:transparent;background-image:none;border-color:#6c757d}.btn-outline-secondary:hover{color:#fff;background-color:#6c757d;border-color:#6c757d}.btn-outline-secondary.focus,.btn-outline-secondary:focus{box-shadow:0 0 0 .2rem rgba(108,117,125,.5)}.btn-outline-secondary.disabled,.btn-outline-secondary:disabled{color:#6c757d;background-color:transparent}.btn-outline-secondary:not(:disabled):not(.disabled).active,.btn-outline-secondary:not(:disabled):not(.disabled):active,.show>.btn-outline-secondary.dropdown-toggle{color:#fff;background-color:#6c757d;border-color:#6c757d}.btn-outline-secondary:not(:disabled):not(.disabled).active:focus,.btn-outline-secondary:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-secondary.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(108,117,125,.5)}.btn-outline-success{color:#28a745;background-color:transparent;background-image:none;border-color:#28a745}.btn-outline-success:hover{color:#fff;background-color:#28a745;border-color:#28a745}.btn-outline-success.focus,.btn-outline-success:focus{box-shadow:0 0 0 .2rem rgba(40,167,69,.5)}.btn-outline-success.disabled,.btn-outline-success:disabled{color:#28a745;background-color:transparent}.btn-outline-success:not(:disabled):not(.disabled).active,.btn-outline-success:not(:disabled):not(.disabled):active,.show>.btn-outline-success.dropdown-toggle{color:#fff;background-color:#28a745;border-color:#28a745}.btn-outline-success:not(:disabled):not(.disabled).active:focus,.btn-outline-success:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-success.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(40,167,69,.5)}.btn-outline-info{color:#17a2b8;background-color:transparent;background-image:none;border-color:#17a2b8}.btn-outline-info:hover{color:#fff;background-color:#17a2b8;border-color:#17a2b8}.btn-outline-info.focus,.btn-outline-info:focus{box-shadow:0 0 0 .2rem rgba(23,162,184,.5)}.btn-outline-info.disabled,.btn-outline-info:disabled{color:#17a2b8;background-color:transparent}.btn-outline-info:not(:disabled):not(.disabled).active,.btn-outline-info:not(:disabled):not(.disabled):active,.show>.btn-outline-info.dropdown-toggle{color:#fff;background-color:#17a2b8;border-color:#17a2b8}.btn-outline-info:not(:disabled):not(.disabled).active:focus,.btn-outline-info:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-info.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(23,162,184,.5)}.btn-outline-warning{color:#ffc107;background-color:transparent;background-image:none;border-color:#ffc107}.btn-outline-warning:hover{color:#212529;background-color:#ffc107;border-color:#ffc107}.btn-outline-warning.focus,.btn-outline-warning:focus{box-shadow:0 0 0 .2rem rgba(255,193,7,.5)}.btn-outline-warning.disabled,.btn-outline-warning:disabled{color:#ffc107;background-color:transparent}.btn-outline-warning:not(:disabled):not(.disabled).active,.btn-outline-warning:not(:disabled):not(.disabled):active,.show>.btn-outline-warning.dropdown-toggle{color:#212529;background-color:#ffc107;border-color:#ffc107}.btn-outline-warning:not(:disabled):not(.disabled).active:focus,.btn-outline-warning:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-warning.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(255,193,7,.5)}.btn-outline-danger{color:#dc3545;background-color:transparent;background-image:none;border-color:#dc3545}.btn-outline-danger:hover{color:#fff;background-color:#dc3545;border-color:#dc3545}.btn-outline-danger.focus,.btn-outline-danger:focus{box-shadow:0 0 0 .2rem rgba(220,53,69,.5)}.btn-outline-danger.disabled,.btn-outline-danger:disabled{color:#dc3545;background-color:transparent}.btn-outline-danger:not(:disabled):not(.disabled).active,.btn-outline-danger:not(:disabled):not(.disabled):active,.show>.btn-outline-danger.dropdown-toggle{color:#fff;background-color:#dc3545;border-color:#dc3545}.btn-outline-danger:not(:disabled):not(.disabled).active:focus,.btn-outline-danger:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-danger.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(220,53,69,.5)}.btn-outline-light{color:#f8f9fa;background-color:transparent;background-image:none;border-color:#f8f9fa}.btn-outline-light:hover{color:#212529;background-color:#f8f9fa;border-color:#f8f9fa}.btn-outline-light.focus,.btn-outline-light:focus{box-shadow:0 0 0 .2rem rgba(248,249,250,.5)}.btn-outline-light.disabled,.btn-outline-light:disabled{color:#f8f9fa;background-color:transparent}.btn-outline-light:not(:disabled):not(.disabled).active,.btn-outline-light:not(:disabled):not(.disabled):active,.show>.btn-outline-light.dropdown-toggle{color:#212529;background-color:#f8f9fa;border-color:#f8f9fa}.btn-outline-light:not(:disabled):not(.disabled).active:focus,.btn-outline-light:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-light.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(248,249,250,.5)}.btn-outline-dark{color:#343a40;background-color:transparent;background-image:none;border-color:#343a40}.btn-outline-dark:hover{color:#fff;background-color:#343a40;border-color:#343a40}.btn-outline-dark.focus,.btn-outline-dark:focus{box-shadow:0 0 0 .2rem rgba(52,58,64,.5)}.btn-outline-dark.disabled,.btn-outline-dark:disabled{color:#343a40;background-color:transparent}.btn-outline-dark:not(:disabled):not(.disabled).active,.btn-outline-dark:not(:disabled):not(.disabled):active,.show>.btn-outline-dark.dropdown-toggle{color:#fff;background-color:#343a40;border-color:#343a40}.btn-outline-dark:not(:disabled):not(.disabled).active:focus,.btn-outline-dark:not(:disabled):not(.disabled):active:focus,.show>.btn-outline-dark.dropdown-toggle:focus{box-shadow:0 0 0 .2rem rgba(52,58,64,.5)}.btn-link{font-weight:400;color:#007bff;background-color:transparent}.btn-link:hover{color:#0056b3;text-decoration:underline;background-color:transparent;border-color:transparent}.btn-link.focus,.btn-link:focus{text-decoration:underline;border-color:transparent;box-shadow:none}.btn-link.disabled,.btn-link:disabled{color:#6c757d}.btn-lg{padding:.5rem 1rem;font-size:1.25rem;line-height:1.5;border-radius:.3rem}.btn-sm{padding:.25rem .5rem;font-size:.875rem;line-height:1.5;border-radius:.2rem}.btn-block{display:block;width:100%}.btn-block+.btn-block{margin-top:.5rem}input[type=button].btn-block,input[type=reset].btn-block,input[type=submit].btn-block{width:100%}.nav{display:flex;flex-wrap:wrap;padding-left:0;margin-bottom:0;list-style:none}.nav-link{display:block;padding:.5rem 1rem}.nav-link:focus,.nav-link:hover{text-decoration:none}.nav-link.disabled{color:#6c757d}.nav-tabs{border-bottom:1px solid #dee2e6}.nav-tabs .nav-item{margin-bottom:-1px}.nav-tabs .nav-link{border:1px solid transparent;border-top-left-radius:.25rem;border-top-right-radius:.25rem}.nav-tabs .nav-link:focus,.nav-tabs .nav-link:hover{border-color:#e9ecef #e9ecef #dee2e6}.nav-tabs .nav-link.disabled{color:#6c757d;background-color:transparent;border-color:transparent}.nav-tabs .nav-item.show .nav-link,.nav-tabs .nav-link.active{color:#495057;background-color:#fff;border-color:#dee2e6 #dee2e6 #fff}.nav-tabs .dropdown-menu{margin-top:-1px;border-top-left-radius:0;border-top-right-radius:0}.nav-pills .nav-link{border-radius:.25rem}.nav-pills .nav-link.active,.nav-pills .show>.nav-link{color:#fff;background-color:#007bff}.nav-fill .nav-item{flex:1 1 auto;text-align:center}.nav-justified .nav-item{flex-basis:0;flex-grow:1;text-align:center}.tab-content>.tab-pane{display:none}.tab-content>.active{display:block}.navbar{position:relative;display:flex;flex-wrap:wrap;align-items:center;justify-content:space-between;padding:.5rem 1rem}.navbar>.container,.navbar>.container-fluid{display:flex;flex-wrap:wrap;align-items:center;justify-content:space-between}.navbar-brand{display:inline-block;padding-top:.3125rem;padding-bottom:.3125rem;margin-right:1rem;font-size:1.25rem;line-height:inherit;white-space:nowrap}.navbar-brand:focus,.navbar-brand:hover{text-decoration:none}.navbar-nav{display:flex;flex-direction:column;padding-left:0;margin-bottom:0;list-style:none}.navbar-nav .nav-link{padding-right:0;padding-left:0}.navbar-nav .dropdown-menu{position:static;float:none}.navbar-text{display:inline-block;padding-top:.5rem;padding-bottom:.5rem}.navbar-collapse{flex-basis:100%;flex-grow:1;align-items:center}.navbar-toggler{padding:.25rem .75rem;font-size:1.25rem;line-height:1;background-color:transparent;border:1px solid transparent;border-radius:.25rem}.navbar-toggler:focus,.navbar-toggler:hover{text-decoration:none}.navbar-toggler:not(:disabled):not(.disabled){cursor:pointer}.navbar-toggler-icon{display:inline-block;width:1.5em;height:1.5em;vertical-align:middle;content:"";background:no-repeat center center;background-size:100% 100%}@media (max-width:575.98px){.navbar-expand-sm>.container,.navbar-expand-sm>.container-fluid{padding-right:0;padding-left:0}}@media (min-width:576px){.navbar-expand-sm{flex-flow:row nowrap;justify-content:flex-start}.navbar-expand-sm .navbar-nav{flex-direction:row}.navbar-expand-sm .navbar-nav .dropdown-menu{position:absolute}.navbar-expand-sm .navbar-nav .dropdown-menu-right{right:0;left:auto}.navbar-expand-sm .navbar-nav .nav-link{padding-right:.5rem;padding-left:.5rem}.navbar-expand-sm>.container,.navbar-expand-sm>.container-fluid{flex-wrap:nowrap}.navbar-expand-sm .navbar-collapse{display:flex!important;flex-basis:auto}.navbar-expand-sm .navbar-toggler{display:none}.navbar-expand-sm .dropup .dropdown-menu{top:auto;bottom:100%}}@media (max-width:767.98px){.navbar-expand-md>.container,.navbar-expand-md>.container-fluid{padding-right:0;padding-left:0}}@media (min-width:768px){.navbar-expand-md{flex-flow:row nowrap;justify-content:flex-start}.navbar-expand-md .navbar-nav{flex-direction:row}.navbar-expand-md .navbar-nav .dropdown-menu{position:absolute}.navbar-expand-md .navbar-nav .dropdown-menu-right{right:0;left:auto}.navbar-expand-md .navbar-nav .nav-link{padding-right:.5rem;padding-left:.5rem}.navbar-expand-md>.container,.navbar-expand-md>.container-fluid{flex-wrap:nowrap}.navbar-expand-md .navbar-collapse{display:flex!important;flex-basis:auto}.navbar-expand-md .navbar-toggler{display:none}.navbar-expand-md .dropup .dropdown-menu{top:auto;bottom:100%}}@media (max-width:991.98px){.navbar-expand-lg>.container,.navbar-expand-lg>.container-fluid{padding-right:0;padding-left:0}}@media (min-width:992px){.navbar-expand-lg{flex-flow:row nowrap;justify-content:flex-start}.navbar-expand-lg .navbar-nav{flex-direction:row}.navbar-expand-lg .navbar-nav .dropdown-menu{position:absolute}.navbar-expand-lg .navbar-nav .dropdown-menu-right{right:0;left:auto}.navbar-expand-lg .navbar-nav .nav-link{padding-right:.5rem;padding-left:.5rem}.navbar-expand-lg>.container,.navbar-expand-lg>.container-fluid{flex-wrap:nowrap}.navbar-expand-lg .navbar-collapse{display:flex!important;flex-basis:auto}.navbar-expand-lg .navbar-toggler{display:none}.navbar-expand-lg .dropup .dropdown-menu{top:auto;bottom:100%}}@media (max-width:1199.98px){.navbar-expand-xl>.container,.navbar-expand-xl>.container-fluid{padding-right:0;padding-left:0}}@media (min-width:1200px){.navbar-expand-xl{flex-flow:row nowrap;justify-content:flex-start}.navbar-expand-xl .navbar-nav{flex-direction:row}.navbar-expand-xl .navbar-nav .dropdown-menu{position:absolute}.navbar-expand-xl .navbar-nav .dropdown-menu-right{right:0;left:auto}.navbar-expand-xl .navbar-nav .nav-link{padding-right:.5rem;padding-left:.5rem}.navbar-expand-xl>.container,.navbar-expand-xl>.container-fluid{flex-wrap:nowrap}.navbar-expand-xl .navbar-collapse{display:flex!important;flex-basis:auto}.navbar-expand-xl .navbar-toggler{display:none}.navbar-expand-xl .dropup .dropdown-menu{top:auto;bottom:100%}}.navbar-expand{flex-flow:row nowrap;justify-content:flex-start}.navbar-expand>.container,.navbar-expand>.container-fluid{padding-right:0;padding-left:0}.navbar-expand .navbar-nav{flex-direction:row}.navbar-expand .navbar-nav .dropdown-menu{position:absolute}.navbar-expand .navbar-nav .dropdown-menu-right{right:0;left:auto}.navbar-expand .navbar-nav .nav-link{padding-right:.5rem;padding-left:.5rem}.navbar-expand>.container,.navbar-expand>.container-fluid{flex-wrap:nowrap}.navbar-expand .navbar-collapse{display:flex!important;flex-basis:auto}.navbar-expand .navbar-toggler{display:none}.navbar-expand .dropup .dropdown-menu{top:auto;bottom:100%}.navbar-light .navbar-brand{color:rgba(0,0,0,.9)}.navbar-light .navbar-brand:focus,.navbar-light .navbar-brand:hover{color:rgba(0,0,0,.9)}.navbar-light .navbar-nav .nav-link{color:rgba(0,0,0,.5)}.navbar-light .navbar-nav .nav-link:focus,.navbar-light .navbar-nav .nav-link:hover{color:rgba(0,0,0,.7)}.navbar-light .navbar-nav .nav-link.disabled{color:rgba(0,0,0,.3)}.navbar-light .navbar-nav .active>.nav-link,.navbar-light .navbar-nav .nav-link.active,.navbar-light .navbar-nav .nav-link.show,.navbar-light .navbar-nav .show>.nav-link{color:rgba(0,0,0,.9)}.navbar-light .navbar-toggler{color:rgba(0,0,0,.5);border-color:rgba(0,0,0,.1)}.navbar-light .navbar-toggler-icon{background-image:url("data:image/svg+xml;charset=utf8,%3Csvg viewBox='0 0 30 30' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath stroke='rgba(0, 0, 0, 0.5)' stroke-width='2' stroke-linecap='round' stroke-miterlimit='10' d='M4 7h22M4 15h22M4 23h22'/%3E%3C/svg%3E")}.navbar-light .navbar-text{color:rgba(0,0,0,.5)}.navbar-light .navbar-text a{color:rgba(0,0,0,.9)}.navbar-light .navbar-text a:focus,.navbar-light .navbar-text a:hover{color:rgba(0,0,0,.9)}.navbar-dark .navbar-brand{color:#fff}.navbar-dark .navbar-brand:focus,.navbar-dark .navbar-brand:hover{color:#fff}.navbar-dark .navbar-nav .nav-link{color:rgba(255,255,255,.5)}.navbar-dark .navbar-nav .nav-link:focus,.navbar-dark .navbar-nav .nav-link:hover{color:rgba(255,255,255,.75)}.navbar-dark .navbar-nav .nav-link.disabled{color:rgba(255,255,255,.25)}.navbar-dark .navbar-nav .active>.nav-link,.navbar-dark .navbar-nav .nav-link.active,.navbar-dark .navbar-nav .nav-link.show,.navbar-dark .navbar-nav .show>.nav-link{color:#fff}.navbar-dark .navbar-toggler{color:rgba(255,255,255,.5);border-color:rgba(255,255,255,.1)}.navbar-dark .navbar-toggler-icon{background-image:url("data:image/svg+xml;charset=utf8,%3Csvg viewBox='0 0 30 30' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath stroke='rgba(255, 255, 255, 0.5)' stroke-width='2' stroke-linecap='round' stroke-miterlimit='10' d='M4 7h22M4 15h22M4 23h22'/%3E%3C/svg%3E")}.navbar-dark .navbar-text{color:rgba(255,255,255,.5)}.navbar-dark .navbar-text a{color:#fff}.navbar-dark .navbar-text a:focus,.navbar-dark .navbar-text a:hover{color:#fff}.pagination{display:flex;padding-left:0;list-style:none;border-radius:.25rem}.page-link{position:relative;display:block;padding:.5rem .75rem;margin-left:-1px;line-height:1.25;color:#007bff;background-color:#fff;border:1px solid #dee2e6}.page-link:hover{color:#0056b3;text-decoration:none;background-color:#e9ecef;border-color:#dee2e6}.page-link:focus{z-index:2;outline:0;box-shadow:0 0 0 .2rem rgba(0,123,255,.25)}.page-link:not(:disabled):not(.disabled){cursor:pointer}.page-item:first-child .page-link{margin-left:0;border-top-left-radius:.25rem;border-bottom-left-radius:.25rem}.page-item:last-child .page-link{border-top-right-radius:.25rem;border-bottom-right-radius:.25rem}.page-item.active .page-link{z-index:1;color:#fff;background-color:#007bff;border-color:#007bff}.page-item.disabled .page-link{color:#6c757d;pointer-events:none;cursor:auto;background-color:#fff;border-color:#dee2e6}.pagination-lg .page-link{padding:.75rem 1.5rem;font-size:1.25rem;line-height:1.5}.pagination-lg .page-item:first-child .page-link{border-top-left-radius:.3rem;border-bottom-left-radius:.3rem}.pagination-lg .page-item:last-child .page-link{border-top-right-radius:.3rem;border-bottom-right-radius:.3rem}.pagination-sm .page-link{padding:.25rem .5rem;font-size:.875rem;line-height:1.5}.pagination-sm .page-item:first-child .page-link{border-top-left-radius:.2rem;border-bottom-left-radius:.2rem}.pagination-sm .page-item:last-child .page-link{border-top-right-radius:.2rem;border-bottom-right-radius:.2rem}.media{display:flex;align-items:flex-start}.media-body{flex:1}.list-group{display:flex;flex-direction:column;padding-left:0;margin-bottom:0}.list-group-item-action{width:100%;color:#495057;text-align:inherit}.list-group-item-action:focus,.list-group-item-action:hover{color:#495057;text-decoration:none;background-color:#f8f9fa}.list-group-item-action:active{color:#212529;background-color:#e9ecef}.list-group-item{position:relative;display:block;padding:.75rem 1.25rem;margin-bottom:-1px;background-color:#fff;border:1px solid rgba(0,0,0,.125)}.list-group-item:first-child{border-top-left-radius:.25rem;border-top-right-radius:.25rem}.list-group-item:last-child{margin-bottom:0;border-bottom-right-radius:.25rem;border-bottom-left-radius:.25rem}.list-group-item:focus,.list-group-item:hover{z-index:1;text-decoration:none}.list-group-item.disabled,.list-group-item:disabled{color:#6c757d;background-color:#fff}.list-group-item.active{z-index:2;color:#fff;background-color:#007bff;border-color:#007bff}.list-group-flush .list-group-item{border-right:0;border-left:0;border-radius:0}.list-group-flush:first-child .list-group-item:first-child{border-top:0}.list-group-flush:last-child .list-group-item:last-child{border-bottom:0}.list-group-item-primary{color:#004085;background-color:#b8daff}.list-group-item-primary.list-group-item-action:focus,.list-group-item-primary.list-group-item-action:hover{color:#004085;background-color:#9fcdff}.list-group-item-primary.list-group-item-action.active{color:#fff;background-color:#004085;border-color:#004085}.list-group-item-secondary{color:#383d41;background-color:#d6d8db}.list-group-item-secondary.list-group-item-action:focus,.list-group-item-secondary.list-group-item-action:hover{color:#383d41;background-color:#c8cbcf}.list-group-item-secondary.list-group-item-action.active{color:#fff;background-color:#383d41;border-color:#383d41}.list-group-item-success{color:#155724;background-color:#c3e6cb}.list-group-item-success.list-group-item-action:focus,.list-group-item-success.list-group-item-action:hover{color:#155724;background-color:#b1dfbb}.list-group-item-success.list-group-item-action.active{color:#fff;background-color:#155724;border-color:#155724}.list-group-item-info{color:#0c5460;background-color:#bee5eb}.list-group-item-info.list-group-item-action:focus,.list-group-item-info.list-group-item-action:hover{color:#0c5460;background-color:#abdde5}.list-group-item-info.list-group-item-action.active{color:#fff;background-color:#0c5460;border-color:#0c5460}.list-group-item-warning{color:#856404;background-color:#ffeeba}.list-group-item-warning.list-group-item-action:focus,.list-group-item-warning.list-group-item-action:hover{color:#856404;background-color:#ffe8a1}.list-group-item-warning.list-group-item-action.active{color:#fff;background-color:#856404;border-color:#856404}.list-group-item-danger{color:#721c24;background-color:#f5c6cb}.list-group-item-danger.list-group-item-action:focus,.list-group-item-danger.list-group-item-action:hover{color:#721c24;background-color:#f1b0b7}.list-group-item-danger.list-group-item-action.active{color:#fff;background-color:#721c24;border-color:#721c24}.list-group-item-light{color:#818182;background-color:#fdfdfe}.list-group-item-light.list-group-item-action:focus,.list-group-item-light.list-group-item-action:hover{color:#818182;background-color:#ececf6}.list-group-item-light.list-group-item-action.active{color:#fff;background-color:#818182;border-color:#818182}.list-group-item-dark{color:#1b1e21;background-color:#c6c8ca}.list-group-item-dark.list-group-item-action:focus,.list-group-item-dark.list-group-item-action:hover{color:#1b1e21;background-color:#b9bbbe}.list-group-item-dark.list-group-item-action.active{color:#fff;background-color:#1b1e21;border-color:#1b1e21}.align-baseline{vertical-align:baseline!important}.align-top{vertical-align:top!important}.align-middle{vertical-align:middle!important}.align-bottom{vertical-align:bottom!important}.align-text-bottom{vertical-align:text-bottom!important}.align-text-top{vertical-align:text-top!important}.bg-primary{background-color:#007bff!important}a.bg-primary:focus,a.bg-primary:hover,button.bg-primary:focus,button.bg-primary:hover{background-color:#0062cc!important}.bg-secondary{background-color:#6c757d!important}a.bg-secondary:focus,a.bg-secondary:hover,button.bg-secondary:focus,button.bg-secondary:hover{background-color:#545b62!important}.bg-success{background-color:#28a745!important}a.bg-success:focus,a.bg-success:hover,button.bg-success:focus,button.bg-success:hover{background-color:#1e7e34!important}.bg-info{background-color:#17a2b8!important}a.bg-info:focus,a.bg-info:hover,button.bg-info:focus,button.bg-info:hover{background-color:#117a8b!important}.bg-warning{background-color:#ffc107!important}a.bg-warning:focus,a.bg-warning:hover,button.bg-warning:focus,button.bg-warning:hover{background-color:#d39e00!important}.bg-danger{background-color:#dc3545!important}a.bg-danger:focus,a.bg-danger:hover,button.bg-danger:focus,button.bg-danger:hover{background-color:#bd2130!important}.bg-light{background-color:#f8f9fa!important}a.bg-light:focus,a.bg-light:hover,button.bg-light:focus,button.bg-light:hover{background-color:#dae0e5!important}.bg-dark{background-color:#343a40!important}a.bg-dark:focus,a.bg-dark:hover,button.bg-dark:focus,button.bg-dark:hover{background-color:#1d2124!important}.bg-white{background-color:#fff!important}.bg-transparent{background-color:transparent!important}.border{border:1px solid #dee2e6!important}.border-top{border-top:1px solid #dee2e6!important}.border-right{border-right:1px solid #dee2e6!important}.border-bottom{border-bottom:1px solid #dee2e6!important}.border-left{border-left:1px solid #dee2e6!important}.border-0{border:0!important}.border-top-0{border-top:0!important}.border-right-0{border-right:0!important}.border-bottom-0{border-bottom:0!important}.border-left-0{border-left:0!important}.border-primary{border-color:#007bff!important}.border-secondary{border-color:#6c757d!important}.border-success{border-color:#28a745!important}.border-info{border-color:#17a2b8!important}.border-warning{border-color:#ffc107!important}.border-danger{border-color:#dc3545!important}.border-light{border-color:#f8f9fa!important}.border-dark{border-color:#343a40!important}.border-white{border-color:#fff!important}.rounded{border-radius:.25rem!important}.rounded-top{border-top-left-radius:.25rem!important;border-top-right-radius:.25rem!important}.rounded-right{border-top-right-radius:.25rem!important;border-bottom-right-radius:.25rem!important}.rounded-bottom{border-bottom-right-radius:.25rem!important;border-bottom-left-radius:.25rem!important}.rounded-left{border-top-left-radius:.25rem!important;border-bottom-left-radius:.25rem!important}.rounded-circle{border-radius:50%!important}.rounded-0{border-radius:0!important}.clearfix::after{display:block;clear:both;content:""}.d-none{display:none!important}.d-inline{display:inline!important}.d-inline-block{display:inline-block!important}.d-block{display:block!important}.d-table{display:table!important}.d-table-row{display:table-row!important}.d-table-cell{display:table-cell!important}.d-flex{display:flex!important}.d-inline-flex{display:inline-flex!important}@media (min-width:576px){.d-sm-none{display:none!important}.d-sm-inline{display:inline!important}.d-sm-inline-block{display:inline-block!important}.d-sm-block{display:block!important}.d-sm-table{display:table!important}.d-sm-table-row{display:table-row!important}.d-sm-table-cell{display:table-cell!important}.d-sm-flex{display:flex!important}.d-sm-inline-flex{display:inline-flex!important}}@media (min-width:768px){.d-md-none{display:none!important}.d-md-inline{display:inline!important}.d-md-inline-block{display:inline-block!important}.d-md-block{display:block!important}.d-md-table{display:table!important}.d-md-table-row{display:table-row!important}.d-md-table-cell{display:table-cell!important}.d-md-flex{display:flex!important}.d-md-inline-flex{display:inline-flex!important}}@media (min-width:992px){.d-lg-none{display:none!important}.d-lg-inline{display:inline!important}.d-lg-inline-block{display:inline-block!important}.d-lg-block{display:block!important}.d-lg-table{display:table!important}.d-lg-table-row{display:table-row!important}.d-lg-table-cell{display:table-cell!important}.d-lg-flex{display:flex!important}.d-lg-inline-flex{display:inline-flex!important}}@media (min-width:1200px){.d-xl-none{display:none!important}.d-xl-inline{display:inline!important}.d-xl-inline-block{display:inline-block!important}.d-xl-block{display:block!important}.d-xl-table{display:table!important}.d-xl-table-row{display:table-row!important}.d-xl-table-cell{display:table-cell!important}.d-xl-flex{display:flex!important}.d-xl-inline-flex{display:inline-flex!important}}@media print{.d-print-none{display:none!important}.d-print-inline{display:inline!important}.d-print-inline-block{display:inline-block!important}.d-print-block{display:block!important}.d-print-table{display:table!important}.d-print-table-row{display:table-row!important}.d-print-table-cell{display:table-cell!important}.d-print-flex{display:flex!important}.d-print-inline-flex{display:inline-flex!important}}.embed-responsive{position:relative;display:block;width:100%;padding:0;overflow:hidden}.embed-responsive::before{display:block;content:""}.embed-responsive .embed-responsive-item,.embed-responsive embed,.embed-responsive iframe,.embed-responsive object,.embed-responsive video{position:absolute;top:0;bottom:0;left:0;width:100%;height:100%;border:0}.embed-responsive-21by9::before{padding-top:42.857143%}.embed-responsive-16by9::before{padding-top:56.25%}.embed-responsive-4by3::before{padding-top:75%}.embed-responsive-1by1::before{padding-top:100%}.flex-row{flex-direction:row!important}.flex-column{flex-direction:column!important}.flex-row-reverse{flex-direction:row-reverse!important}.flex-column-reverse{flex-direction:column-reverse!important}.flex-wrap{flex-wrap:wrap!important}.flex-nowrap{flex-wrap:nowrap!important}.flex-wrap-reverse{flex-wrap:wrap-reverse!important}.justify-content-start{justify-content:flex-start!important}.justify-content-end{justify-content:flex-end!important}.justify-content-center{justify-content:center!important}.justify-content-between{justify-content:space-between!important}.justify-content-around{justify-content:space-around!important}.align-items-start{align-items:flex-start!important}.align-items-end{align-items:flex-end!important}.align-items-center{align-items:center!important}.align-items-baseline{align-items:baseline!important}.align-items-stretch{align-items:stretch!important}.align-content-start{align-content:flex-start!important}.align-content-end{align-content:flex-end!important}.align-content-center{align-content:center!important}.align-content-between{align-content:space-between!important}.align-content-around{align-content:space-around!important}.align-content-stretch{align-content:stretch!important}.align-self-auto{align-self:auto!important}.align-self-start{align-self:flex-start!important}.align-self-end{align-self:flex-end!important}.align-self-center{align-self:center!important}.align-self-baseline{align-self:baseline!important}.align-self-stretch{align-self:stretch!important}@media (min-width:576px){.flex-sm-row{flex-direction:row!important}.flex-sm-column{flex-direction:column!important}.flex-sm-row-reverse{flex-direction:row-reverse!important}.flex-sm-column-reverse{flex-direction:column-reverse!important}.flex-sm-wrap{flex-wrap:wrap!important}.flex-sm-nowrap{flex-wrap:nowrap!important}.flex-sm-wrap-reverse{flex-wrap:wrap-reverse!important}.justify-content-sm-start{justify-content:flex-start!important}.justify-content-sm-end{justify-content:flex-end!important}.justify-content-sm-center{justify-content:center!important}.justify-content-sm-between{justify-content:space-between!important}.justify-content-sm-around{justify-content:space-around!important}.align-items-sm-start{align-items:flex-start!important}.align-items-sm-end{align-items:flex-end!important}.align-items-sm-center{align-items:center!important}.align-items-sm-baseline{align-items:baseline!important}.align-items-sm-stretch{align-items:stretch!important}.align-content-sm-start{align-content:flex-start!important}.align-content-sm-end{align-content:flex-end!important}.align-content-sm-center{align-content:center!important}.align-content-sm-between{align-content:space-between!important}.align-content-sm-around{align-content:space-around!important}.align-content-sm-stretch{align-content:stretch!important}.align-self-sm-auto{align-self:auto!important}.align-self-sm-start{align-self:flex-start!important}.align-self-sm-end{align-self:flex-end!important}.align-self-sm-center{align-self:center!important}.align-self-sm-baseline{align-self:baseline!important}.align-self-sm-stretch{align-self:stretch!important}}@media (min-width:768px){.flex-md-row{flex-direction:row!important}.flex-md-column{flex-direction:column!important}.flex-md-row-reverse{flex-direction:row-reverse!important}.flex-md-column-reverse{flex-direction:column-reverse!important}.flex-md-wrap{flex-wrap:wrap!important}.flex-md-nowrap{flex-wrap:nowrap!important}.flex-md-wrap-reverse{flex-wrap:wrap-reverse!important}.justify-content-md-start{justify-content:flex-start!important}.justify-content-md-end{justify-content:flex-end!important}.justify-content-md-center{justify-content:center!important}.justify-content-md-between{justify-content:space-between!important}.justify-content-md-around{justify-content:space-around!important}.align-items-md-start{align-items:flex-start!important}.align-items-md-end{align-items:flex-end!important}.align-items-md-center{align-items:center!important}.align-items-md-baseline{align-items:baseline!important}.align-items-md-stretch{align-items:stretch!important}.align-content-md-start{align-content:flex-start!important}.align-content-md-end{align-content:flex-end!important}.align-content-md-center{align-content:center!important}.align-content-md-between{align-content:space-between!important}.align-content-md-around{align-content:space-around!important}.align-content-md-stretch{align-content:stretch!important}.align-self-md-auto{align-self:auto!important}.align-self-md-start{align-self:flex-start!important}.align-self-md-end{align-self:flex-end!important}.align-self-md-center{align-self:center!important}.align-self-md-baseline{align-self:baseline!important}.align-self-md-stretch{align-self:stretch!important}}@media (min-width:992px){.flex-lg-row{flex-direction:row!important}.flex-lg-column{flex-direction:column!important}.flex-lg-row-reverse{flex-direction:row-reverse!important}.flex-lg-column-reverse{flex-direction:column-reverse!important}.flex-lg-wrap{flex-wrap:wrap!important}.flex-lg-nowrap{flex-wrap:nowrap!important}.flex-lg-wrap-reverse{flex-wrap:wrap-reverse!important}.justify-content-lg-start{justify-content:flex-start!important}.justify-content-lg-end{justify-content:flex-end!important}.justify-content-lg-center{justify-content:center!important}.justify-content-lg-between{justify-content:space-between!important}.justify-content-lg-around{justify-content:space-around!important}.align-items-lg-start{align-items:flex-start!important}.align-items-lg-end{align-items:flex-end!important}.align-items-lg-center{align-items:center!important}.align-items-lg-baseline{align-items:baseline!important}.align-items-lg-stretch{align-items:stretch!important}.align-content-lg-start{align-content:flex-start!important}.align-content-lg-end{align-content:flex-end!important}.align-content-lg-center{align-content:center!important}.align-content-lg-between{align-content:space-between!important}.align-content-lg-around{align-content:space-around!important}.align-content-lg-stretch{align-content:stretch!important}.align-self-lg-auto{align-self:auto!important}.align-self-lg-start{align-self:flex-start!important}.align-self-lg-end{align-self:flex-end!important}.align-self-lg-center{align-self:center!important}.align-self-lg-baseline{align-self:baseline!important}.align-self-lg-stretch{align-self:stretch!important}}@media (min-width:1200px){.flex-xl-row{flex-direction:row!important}.flex-xl-column{flex-direction:column!important}.flex-xl-row-reverse{flex-direction:row-reverse!important}.flex-xl-column-reverse{flex-direction:column-reverse!important}.flex-xl-wrap{flex-wrap:wrap!important}.flex-xl-nowrap{flex-wrap:nowrap!important}.flex-xl-wrap-reverse{flex-wrap:wrap-reverse!important}.justify-content-xl-start{justify-content:flex-start!important}.justify-content-xl-end{justify-content:flex-end!important}.justify-content-xl-center{justify-content:center!important}.justify-content-xl-between{justify-content:space-between!important}.justify-content-xl-around{justify-content:space-around!important}.align-items-xl-start{align-items:flex-start!important}.align-items-xl-end{align-items:flex-end!important}.align-items-xl-center{align-items:center!important}.align-items-xl-baseline{align-items:baseline!important}.align-items-xl-stretch{align-items:stretch!important}.align-content-xl-start{align-content:flex-start!important}.align-content-xl-end{align-content:flex-end!important}.align-content-xl-center{align-content:center!important}.align-content-xl-between{align-content:space-between!important}.align-content-xl-around{align-content:space-around!important}.align-content-xl-stretch{align-content:stretch!important}.align-self-xl-auto{align-self:auto!important}.align-self-xl-start{align-self:flex-start!important}.align-self-xl-end{align-self:flex-end!important}.align-self-xl-center{align-self:center!important}.align-self-xl-baseline{align-self:baseline!important}.align-self-xl-stretch{align-self:stretch!important}}.float-left{float:left!important}.float-right{float:right!important}.float-none{float:none!important}@media (min-width:576px){.float-sm-left{float:left!important}.float-sm-right{float:right!important}.float-sm-none{float:none!important}}@media (min-width:768px){.float-md-left{float:left!important}.float-md-right{float:right!important}.float-md-none{float:none!important}}@media (min-width:992px){.float-lg-left{float:left!important}.float-lg-right{float:right!important}.float-lg-none{float:none!important}}@media (min-width:1200px){.float-xl-left{float:left!important}.float-xl-right{float:right!important}.float-xl-none{float:none!important}}.position-static{position:static!important}.position-relative{position:relative!important}.position-absolute{position:absolute!important}.position-fixed{position:fixed!important}.position-sticky{position:sticky!important}.fixed-top{position:fixed;top:0;right:0;left:0;z-index:1030}.fixed-bottom{position:fixed;right:0;bottom:0;left:0;z-index:1030}@supports (position:sticky){.sticky-top{position:sticky;top:0;z-index:1020}}.sr-only{position:absolute;width:1px;height:1px;padding:0;overflow:hidden;clip:rect(0,0,0,0);white-space:nowrap;clip-path:inset(50%);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;overflow:visible;clip:auto;white-space:normal;clip-path:none}.w-25{width:25%!important}.w-50{width:50%!important}.w-75{width:75%!important}.w-100{width:100%!important}.h-25{height:25%!important}.h-50{height:50%!important}.h-75{height:75%!important}.h-100{height:100%!important}.mw-100{max-width:100%!important}.mh-100{max-height:100%!important}.m-0{margin:0!important}.mt-0,.my-0{margin-top:0!important}.mr-0,.mx-0{margin-right:0!important}.mb-0,.my-0{margin-bottom:0!important}.ml-0,.mx-0{margin-left:0!important}.m-1{margin:.25rem!important}.mt-1,.my-1{margin-top:.25rem!important}.mr-1,.mx-1{margin-right:.25rem!important}.mb-1,.my-1{margin-bottom:.25rem!important}.ml-1,.mx-1{margin-left:.25rem!important}.m-2{margin:.5rem!important}.mt-2,.my-2{margin-top:.5rem!important}.mr-2,.mx-2{margin-right:.5rem!important}.mb-2,.my-2{margin-bottom:.5rem!important}.ml-2,.mx-2{margin-left:.5rem!important}.m-3{margin:1rem!important}.mt-3,.my-3{margin-top:1rem!important}.mr-3,.mx-3{margin-right:1rem!important}.mb-3,.my-3{margin-bottom:1rem!important}.ml-3,.mx-3{margin-left:1rem!important}.m-4{margin:1.5rem!important}.mt-4,.my-4{margin-top:1.5rem!important}.mr-4,.mx-4{margin-right:1.5rem!important}.mb-4,.my-4{margin-bottom:1.5rem!important}.ml-4,.mx-4{margin-left:1.5rem!important}.m-5{margin:3rem!important}.mt-5,.my-5{margin-top:3rem!important}.mr-5,.mx-5{margin-right:3rem!important}.mb-5,.my-5{margin-bottom:3rem!important}.ml-5,.mx-5{margin-left:3rem!important}.p-0{padding:0!important}.pt-0,.py-0{padding-top:0!important}.pr-0,.px-0{padding-right:0!important}.pb-0,.py-0{padding-bottom:0!important}.pl-0,.px-0{padding-left:0!important}.p-1{padding:.25rem!important}.pt-1,.py-1{padding-top:.25rem!important}.pr-1,.px-1{padding-right:.25rem!important}.pb-1,.py-1{padding-bottom:.25rem!important}.pl-1,.px-1{padding-left:.25rem!important}.p-2{padding:.5rem!important}.pt-2,.py-2{padding-top:.5rem!important}.pr-2,.px-2{padding-right:.5rem!important}.pb-2,.py-2{padding-bottom:.5rem!important}.pl-2,.px-2{padding-left:.5rem!important}.p-3{padding:1rem!important}.pt-3,.py-3{padding-top:1rem!important}.pr-3,.px-3{padding-right:1rem!important}.pb-3,.py-3{padding-bottom:1rem!important}.pl-3,.px-3{padding-left:1rem!important}.p-4{padding:1.5rem!important}.pt-4,.py-4{padding-top:1.5rem!important}.pr-4,.px-4{padding-right:1.5rem!important}.pb-4,.py-4{padding-bottom:1.5rem!important}.pl-4,.px-4{padding-left:1.5rem!important}.p-5{padding:3rem!important}.pt-5,.py-5{padding-top:3rem!important}.pr-5,.px-5{padding-right:3rem!important}.pb-5,.py-5{padding-bottom:3rem!important}.pl-5,.px-5{padding-left:3rem!important}.m-auto{margin:auto!important}.mt-auto,.my-auto{margin-top:auto!important}.mr-auto,.mx-auto{margin-right:auto!important}.mb-auto,.my-auto{margin-bottom:auto!important}.ml-auto,.mx-auto{margin-left:auto!important}@media (min-width:576px){.m-sm-0{margin:0!important}.mt-sm-0,.my-sm-0{margin-top:0!important}.mr-sm-0,.mx-sm-0{margin-right:0!important}.mb-sm-0,.my-sm-0{margin-bottom:0!important}.ml-sm-0,.mx-sm-0{margin-left:0!important}.m-sm-1{margin:.25rem!important}.mt-sm-1,.my-sm-1{margin-top:.25rem!important}.mr-sm-1,.mx-sm-1{margin-right:.25rem!important}.mb-sm-1,.my-sm-1{margin-bottom:.25rem!important}.ml-sm-1,.mx-sm-1{margin-left:.25rem!important}.m-sm-2{margin:.5rem!important}.mt-sm-2,.my-sm-2{margin-top:.5rem!important}.mr-sm-2,.mx-sm-2{margin-right:.5rem!important}.mb-sm-2,.my-sm-2{margin-bottom:.5rem!important}.ml-sm-2,.mx-sm-2{margin-left:.5rem!important}.m-sm-3{margin:1rem!important}.mt-sm-3,.my-sm-3{margin-top:1rem!important}.mr-sm-3,.mx-sm-3{margin-right:1rem!important}.mb-sm-3,.my-sm-3{margin-bottom:1rem!important}.ml-sm-3,.mx-sm-3{margin-left:1rem!important}.m-sm-4{margin:1.5rem!important}.mt-sm-4,.my-sm-4{margin-top:1.5rem!important}.mr-sm-4,.mx-sm-4{margin-right:1.5rem!important}.mb-sm-4,.my-sm-4{margin-bottom:1.5rem!important}.ml-sm-4,.mx-sm-4{margin-left:1.5rem!important}.m-sm-5{margin:3rem!important}.mt-sm-5,.my-sm-5{margin-top:3rem!important}.mr-sm-5,.mx-sm-5{margin-right:3rem!important}.mb-sm-5,.my-sm-5{margin-bottom:3rem!important}.ml-sm-5,.mx-sm-5{margin-left:3rem!important}.p-sm-0{padding:0!important}.pt-sm-0,.py-sm-0{padding-top:0!important}.pr-sm-0,.px-sm-0{padding-right:0!important}.pb-sm-0,.py-sm-0{padding-bottom:0!important}.pl-sm-0,.px-sm-0{padding-left:0!important}.p-sm-1{padding:.25rem!important}.pt-sm-1,.py-sm-1{padding-top:.25rem!important}.pr-sm-1,.px-sm-1{padding-right:.25rem!important}.pb-sm-1,.py-sm-1{padding-bottom:.25rem!important}.pl-sm-1,.px-sm-1{padding-left:.25rem!important}.p-sm-2{padding:.5rem!important}.pt-sm-2,.py-sm-2{padding-top:.5rem!important}.pr-sm-2,.px-sm-2{padding-right:.5rem!important}.pb-sm-2,.py-sm-2{padding-bottom:.5rem!important}.pl-sm-2,.px-sm-2{padding-left:.5rem!important}.p-sm-3{padding:1rem!important}.pt-sm-3,.py-sm-3{padding-top:1rem!important}.pr-sm-3,.px-sm-3{padding-right:1rem!important}.pb-sm-3,.py-sm-3{padding-bottom:1rem!important}.pl-sm-3,.px-sm-3{padding-left:1rem!important}.p-sm-4{padding:1.5rem!important}.pt-sm-4,.py-sm-4{padding-top:1.5rem!important}.pr-sm-4,.px-sm-4{padding-right:1.5rem!important}.pb-sm-4,.py-sm-4{padding-bottom:1.5rem!important}.pl-sm-4,.px-sm-4{padding-left:1.5rem!important}.p-sm-5{padding:3rem!important}.pt-sm-5,.py-sm-5{padding-top:3rem!important}.pr-sm-5,.px-sm-5{padding-right:3rem!important}.pb-sm-5,.py-sm-5{padding-bottom:3rem!important}.pl-sm-5,.px-sm-5{padding-left:3rem!important}.m-sm-auto{margin:auto!important}.mt-sm-auto,.my-sm-auto{margin-top:auto!important}.mr-sm-auto,.mx-sm-auto{margin-right:auto!important}.mb-sm-auto,.my-sm-auto{margin-bottom:auto!important}.ml-sm-auto,.mx-sm-auto{margin-left:auto!important}}@media (min-width:768px){.m-md-0{margin:0!important}.mt-md-0,.my-md-0{margin-top:0!important}.mr-md-0,.mx-md-0{margin-right:0!important}.mb-md-0,.my-md-0{margin-bottom:0!important}.ml-md-0,.mx-md-0{margin-left:0!important}.m-md-1{margin:.25rem!important}.mt-md-1,.my-md-1{margin-top:.25rem!important}.mr-md-1,.mx-md-1{margin-right:.25rem!important}.mb-md-1,.my-md-1{margin-bottom:.25rem!important}.ml-md-1,.mx-md-1{margin-left:.25rem!important}.m-md-2{margin:.5rem!important}.mt-md-2,.my-md-2{margin-top:.5rem!important}.mr-md-2,.mx-md-2{margin-right:.5rem!important}.mb-md-2,.my-md-2{margin-bottom:.5rem!important}.ml-md-2,.mx-md-2{margin-left:.5rem!important}.m-md-3{margin:1rem!important}.mt-md-3,.my-md-3{margin-top:1rem!important}.mr-md-3,.mx-md-3{margin-right:1rem!important}.mb-md-3,.my-md-3{margin-bottom:1rem!important}.ml-md-3,.mx-md-3{margin-left:1rem!important}.m-md-4{margin:1.5rem!important}.mt-md-4,.my-md-4{margin-top:1.5rem!important}.mr-md-4,.mx-md-4{margin-right:1.5rem!important}.mb-md-4,.my-md-4{margin-bottom:1.5rem!important}.ml-md-4,.mx-md-4{margin-left:1.5rem!important}.m-md-5{margin:3rem!important}.mt-md-5,.my-md-5{margin-top:3rem!important}.mr-md-5,.mx-md-5{margin-right:3rem!important}.mb-md-5,.my-md-5{margin-bottom:3rem!important}.ml-md-5,.mx-md-5{margin-left:3rem!important}.p-md-0{padding:0!important}.pt-md-0,.py-md-0{padding-top:0!important}.pr-md-0,.px-md-0{padding-right:0!important}.pb-md-0,.py-md-0{padding-bottom:0!important}.pl-md-0,.px-md-0{padding-left:0!important}.p-md-1{padding:.25rem!important}.pt-md-1,.py-md-1{padding-top:.25rem!important}.pr-md-1,.px-md-1{padding-right:.25rem!important}.pb-md-1,.py-md-1{padding-bottom:.25rem!important}.pl-md-1,.px-md-1{padding-left:.25rem!important}.p-md-2{padding:.5rem!important}.pt-md-2,.py-md-2{padding-top:.5rem!important}.pr-md-2,.px-md-2{padding-right:.5rem!important}.pb-md-2,.py-md-2{padding-bottom:.5rem!important}.pl-md-2,.px-md-2{padding-left:.5rem!important}.p-md-3{padding:1rem!important}.pt-md-3,.py-md-3{padding-top:1rem!important}.pr-md-3,.px-md-3{padding-right:1rem!important}.pb-md-3,.py-md-3{padding-bottom:1rem!important}.pl-md-3,.px-md-3{padding-left:1rem!important}.p-md-4{padding:1.5rem!important}.pt-md-4,.py-md-4{padding-top:1.5rem!important}.pr-md-4,.px-md-4{padding-right:1.5rem!important}.pb-md-4,.py-md-4{padding-bottom:1.5rem!important}.pl-md-4,.px-md-4{padding-left:1.5rem!important}.p-md-5{padding:3rem!important}.pt-md-5,.py-md-5{padding-top:3rem!important}.pr-md-5,.px-md-5{padding-right:3rem!important}.pb-md-5,.py-md-5{padding-bottom:3rem!important}.pl-md-5,.px-md-5{padding-left:3rem!important}.m-md-auto{margin:auto!important}.mt-md-auto,.my-md-auto{margin-top:auto!important}.mr-md-auto,.mx-md-auto{margin-right:auto!important}.mb-md-auto,.my-md-auto{margin-bottom:auto!important}.ml-md-auto,.mx-md-auto{margin-left:auto!important}}@media (min-width:992px){.m-lg-0{margin:0!important}.mt-lg-0,.my-lg-0{margin-top:0!important}.mr-lg-0,.mx-lg-0{margin-right:0!important}.mb-lg-0,.my-lg-0{margin-bottom:0!important}.ml-lg-0,.mx-lg-0{margin-left:0!important}.m-lg-1{margin:.25rem!important}.mt-lg-1,.my-lg-1{margin-top:.25rem!important}.mr-lg-1,.mx-lg-1{margin-right:.25rem!important}.mb-lg-1,.my-lg-1{margin-bottom:.25rem!important}.ml-lg-1,.mx-lg-1{margin-left:.25rem!important}.m-lg-2{margin:.5rem!important}.mt-lg-2,.my-lg-2{margin-top:.5rem!important}.mr-lg-2,.mx-lg-2{margin-right:.5rem!important}.mb-lg-2,.my-lg-2{margin-bottom:.5rem!important}.ml-lg-2,.mx-lg-2{margin-left:.5rem!important}.m-lg-3{margin:1rem!important}.mt-lg-3,.my-lg-3{margin-top:1rem!important}.mr-lg-3,.mx-lg-3{margin-right:1rem!important}.mb-lg-3,.my-lg-3{margin-bottom:1rem!important}.ml-lg-3,.mx-lg-3{margin-left:1rem!important}.m-lg-4{margin:1.5rem!important}.mt-lg-4,.my-lg-4{margin-top:1.5rem!important}.mr-lg-4,.mx-lg-4{margin-right:1.5rem!important}.mb-lg-4,.my-lg-4{margin-bottom:1.5rem!important}.ml-lg-4,.mx-lg-4{margin-left:1.5rem!important}.m-lg-5{margin:3rem!important}.mt-lg-5,.my-lg-5{margin-top:3rem!important}.mr-lg-5,.mx-lg-5{margin-right:3rem!important}.mb-lg-5,.my-lg-5{margin-bottom:3rem!important}.ml-lg-5,.mx-lg-5{margin-left:3rem!important}.p-lg-0{padding:0!important}.pt-lg-0,.py-lg-0{padding-top:0!important}.pr-lg-0,.px-lg-0{padding-right:0!important}.pb-lg-0,.py-lg-0{padding-bottom:0!important}.pl-lg-0,.px-lg-0{padding-left:0!important}.p-lg-1{padding:.25rem!important}.pt-lg-1,.py-lg-1{padding-top:.25rem!important}.pr-lg-1,.px-lg-1{padding-right:.25rem!important}.pb-lg-1,.py-lg-1{padding-bottom:.25rem!important}.pl-lg-1,.px-lg-1{padding-left:.25rem!important}.p-lg-2{padding:.5rem!important}.pt-lg-2,.py-lg-2{padding-top:.5rem!important}.pr-lg-2,.px-lg-2{padding-right:.5rem!important}.pb-lg-2,.py-lg-2{padding-bottom:.5rem!important}.pl-lg-2,.px-lg-2{padding-left:.5rem!important}.p-lg-3{padding:1rem!important}.pt-lg-3,.py-lg-3{padding-top:1rem!important}.pr-lg-3,.px-lg-3{padding-right:1rem!important}.pb-lg-3,.py-lg-3{padding-bottom:1rem!important}.pl-lg-3,.px-lg-3{padding-left:1rem!important}.p-lg-4{padding:1.5rem!important}.pt-lg-4,.py-lg-4{padding-top:1.5rem!important}.pr-lg-4,.px-lg-4{padding-right:1.5rem!important}.pb-lg-4,.py-lg-4{padding-bottom:1.5rem!important}.pl-lg-4,.px-lg-4{padding-left:1.5rem!important}.p-lg-5{padding:3rem!important}.pt-lg-5,.py-lg-5{padding-top:3rem!important}.pr-lg-5,.px-lg-5{padding-right:3rem!important}.pb-lg-5,.py-lg-5{padding-bottom:3rem!important}.pl-lg-5,.px-lg-5{padding-left:3rem!important}.m-lg-auto{margin:auto!important}.mt-lg-auto,.my-lg-auto{margin-top:auto!important}.mr-lg-auto,.mx-lg-auto{margin-right:auto!important}.mb-lg-auto,.my-lg-auto{margin-bottom:auto!important}.ml-lg-auto,.mx-lg-auto{margin-left:auto!important}}@media (min-width:1200px){.m-xl-0{margin:0!important}.mt-xl-0,.my-xl-0{margin-top:0!important}.mr-xl-0,.mx-xl-0{margin-right:0!important}.mb-xl-0,.my-xl-0{margin-bottom:0!important}.ml-xl-0,.mx-xl-0{margin-left:0!important}.m-xl-1{margin:.25rem!important}.mt-xl-1,.my-xl-1{margin-top:.25rem!important}.mr-xl-1,.mx-xl-1{margin-right:.25rem!important}.mb-xl-1,.my-xl-1{margin-bottom:.25rem!important}.ml-xl-1,.mx-xl-1{margin-left:.25rem!important}.m-xl-2{margin:.5rem!important}.mt-xl-2,.my-xl-2{margin-top:.5rem!important}.mr-xl-2,.mx-xl-2{margin-right:.5rem!important}.mb-xl-2,.my-xl-2{margin-bottom:.5rem!important}.ml-xl-2,.mx-xl-2{margin-left:.5rem!important}.m-xl-3{margin:1rem!important}.mt-xl-3,.my-xl-3{margin-top:1rem!important}.mr-xl-3,.mx-xl-3{margin-right:1rem!important}.mb-xl-3,.my-xl-3{margin-bottom:1rem!important}.ml-xl-3,.mx-xl-3{margin-left:1rem!important}.m-xl-4{margin:1.5rem!important}.mt-xl-4,.my-xl-4{margin-top:1.5rem!important}.mr-xl-4,.mx-xl-4{margin-right:1.5rem!important}.mb-xl-4,.my-xl-4{margin-bottom:1.5rem!important}.ml-xl-4,.mx-xl-4{margin-left:1.5rem!important}.m-xl-5{margin:3rem!important}.mt-xl-5,.my-xl-5{margin-top:3rem!important}.mr-xl-5,.mx-xl-5{margin-right:3rem!important}.mb-xl-5,.my-xl-5{margin-bottom:3rem!important}.ml-xl-5,.mx-xl-5{margin-left:3rem!important}.p-xl-0{padding:0!important}.pt-xl-0,.py-xl-0{padding-top:0!important}.pr-xl-0,.px-xl-0{padding-right:0!important}.pb-xl-0,.py-xl-0{padding-bottom:0!important}.pl-xl-0,.px-xl-0{padding-left:0!important}.p-xl-1{padding:.25rem!important}.pt-xl-1,.py-xl-1{padding-top:.25rem!important}.pr-xl-1,.px-xl-1{padding-right:.25rem!important}.pb-xl-1,.py-xl-1{padding-bottom:.25rem!important}.pl-xl-1,.px-xl-1{padding-left:.25rem!important}.p-xl-2{padding:.5rem!important}.pt-xl-2,.py-xl-2{padding-top:.5rem!important}.pr-xl-2,.px-xl-2{padding-right:.5rem!important}.pb-xl-2,.py-xl-2{padding-bottom:.5rem!important}.pl-xl-2,.px-xl-2{padding-left:.5rem!important}.p-xl-3{padding:1rem!important}.pt-xl-3,.py-xl-3{padding-top:1rem!important}.pr-xl-3,.px-xl-3{padding-right:1rem!important}.pb-xl-3,.py-xl-3{padding-bottom:1rem!important}.pl-xl-3,.px-xl-3{padding-left:1rem!important}.p-xl-4{padding:1.5rem!important}.pt-xl-4,.py-xl-4{padding-top:1.5rem!important}.pr-xl-4,.px-xl-4{padding-right:1.5rem!important}.pb-xl-4,.py-xl-4{padding-bottom:1.5rem!important}.pl-xl-4,.px-xl-4{padding-left:1.5rem!important}.p-xl-5{padding:3rem!important}.pt-xl-5,.py-xl-5{padding-top:3rem!important}.pr-xl-5,.px-xl-5{padding-right:3rem!important}.pb-xl-5,.py-xl-5{padding-bottom:3rem!important}.pl-xl-5,.px-xl-5{padding-left:3rem!important}.m-xl-auto{margin:auto!important}.mt-xl-auto,.my-xl-auto{margin-top:auto!important}.mr-xl-auto,.mx-xl-auto{margin-right:auto!important}.mb-xl-auto,.my-xl-auto{margin-bottom:auto!important}.ml-xl-auto,.mx-xl-auto{margin-left:auto!important}}.text-justify{text-align:justify!important}.text-nowrap{white-space:nowrap!important}.text-truncate{overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.text-left{text-align:left!important}.text-right{text-align:right!important}.text-center{text-align:center!important}@media (min-width:576px){.text-sm-left{text-align:left!important}.text-sm-right{text-align:right!important}.text-sm-center{text-align:center!important}}@media (min-width:768px){.text-md-left{text-align:left!important}.text-md-right{text-align:right!important}.text-md-center{text-align:center!important}}@media (min-width:992px){.text-lg-left{text-align:left!important}.text-lg-right{text-align:right!important}.text-lg-center{text-align:center!important}}@media (min-width:1200px){.text-xl-left{text-align:left!important}.text-xl-right{text-align:right!important}.text-xl-center{text-align:center!important}}.text-lowercase{text-transform:lowercase!important}.text-uppercase{text-transform:uppercase!important}.text-capitalize{text-transform:capitalize!important}.font-weight-light{font-weight:300!important}.font-weight-normal{font-weight:400!important}.font-weight-bold{font-weight:700!important}.font-italic{font-style:italic!important}.text-white{color:#fff!important}.text-primary{color:#007bff!important}a.text-primary:focus,a.text-primary:hover{color:#0062cc!important}.text-secondary{color:#6c757d!important}a.text-secondary:focus,a.text-secondary:hover{color:#545b62!important}.text-success{color:#28a745!important}a.text-success:focus,a.text-success:hover{color:#1e7e34!important}.text-info{color:#17a2b8!important}a.text-info:focus,a.text-info:hover{color:#117a8b!important}.text-warning{color:#ffc107!important}a.text-warning:focus,a.text-warning:hover{color:#d39e00!important}.text-danger{color:#dc3545!important}a.text-danger:focus,a.text-danger:hover{color:#bd2130!important}.text-light{color:#f8f9fa!important}a.text-light:focus,a.text-light:hover{color:#dae0e5!important}.text-dark{color:#343a40!important}a.text-dark:focus,a.text-dark:hover{color:#1d2124!important}.text-muted{color:#6c757d!important}.text-hide{font:0/0 a;color:transparent;text-shadow:none;background-color:transparent;border:0}.visible{visibility:visible!important}.invisible{visibility:hidden!important}@media print{*,::after,::before{text-shadow:none!important;box-shadow:none!important}a:not(.btn){text-decoration:underline}abbr[title]::after{content:" (" attr(title) ")"}pre{white-space:pre-wrap!important}blockquote,pre{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}img,tr{page-break-inside:avoid}h2,h3,p{orphans:3;widows:3}h2,h3{page-break-after:avoid}@page{size:a3}body{min-width:992px!important}.container{min-width:992px!important}.navbar{display:none}.badge{border:1px solid #000}.table{border-collapse:collapse!important}.table td,.table th{background-color:#fff!important}.table-bordered td,.table-bordered th{border:1px solid #ddd!important}}@media (min-width:48em){html{font-size:18px}}body{color:#555}.h1,.h2,.h3,.h4,.h5,.h6,h1,h2,h3,h4,h5,h6{font-weight:400;color:#333}.h1 a,.h1 a:focus,.h1 a:hover,.h2 a,.h2 a:focus,.h2 a:hover,.h3 a,.h3 a:focus,.h3 a:hover,.h4 a,.h4 a:focus,.h4 a:hover,.h5 a,.h5 a:focus,.h5 a:hover,.h6 a,.h6 a:focus,.h6 a:hover,h1 a,h1 a:focus,h1 a:hover,h2 a,h2 a:focus,h2 a:hover,h3 a,h3 a:focus,h3 a:hover,h4 a,h4 a:focus,h4 a:hover,h5 a,h5 a:focus,h5 a:hover,h6 a,h6 a:focus,h6 a:hover{color:inherit;text-decoration:none}.container{max-width:60rem}.blog-masthead{margin-bottom:3rem;background-color:#428bca;-webkit-box-shadow:inset 0 -.1rem .25rem rgba(0,0,0,.1);box-shadow:inset 0 -.1rem .25rem rgba(0,0,0,.1)}.nav-link{position:relative;padding:1rem;font-weight:500;color:#cdddeb}.nav-link:focus,.nav-link:hover{color:#fff;background-color:transparent}.nav-link.active{color:#fff}.nav-link.active:after{position:absolute;bottom:0;left:50%;width:0;height:0;margin-left:-.3rem;vertical-align:middle;content:"";border-right:.3rem solid transparent;border-bottom:.3rem solid;border-left:.3rem solid transparent}.blog-header{padding-bottom:1.25rem;margin-bottom:2rem;border-bottom:.05rem solid #eee}.blog-title{margin-bottom:0;font-size:2rem;font-weight:400}.blog-description{font-size:1.1rem;color:#999}@media (min-width:40em){.blog-title{font-size:3.5rem}}.sidebar-module{padding:1rem}.sidebar-module-inset{padding:1rem;background-color:#f5f5f5;border-radius:.25rem}.sidebar-module-inset ol:last-child,.sidebar-module-inset p:last-child,.sidebar-module-inset ul:last-child{margin-bottom:0}.blog-pagination{margin-bottom:4rem}.blog-pagination>.btn{border-radius:2rem}.blog-post{margin-bottom:4rem}.blog-post-title{margin-bottom:.25rem;font-size:2.5rem}.blog-post-meta{margin-bottom:1.25rem;color:#999}article img{max-width:100%;height:auto;margin:13px auto}.sharing-icons .nav-item+.nav-item{margin-left:1rem}section+#disqus_thread{margin-top:1rem}article blockquote{margin-bottom:1rem;font-size:1.25rem}article div.highlight{padding:5px 5px 0 5px}.blog-footer{padding:2.5rem 0;color:#999;text-align:center;background-color:#f9f9f9;border-top:.05rem solid #e5e5e5}.blog-footer p:last-child{margin-bottom:0} \ No newline at end of file diff --git a/public/fonts/FontAwesome.otf b/public/fonts/FontAwesome.otf deleted file mode 100644 index 401ec0f36..000000000 Binary files a/public/fonts/FontAwesome.otf and /dev/null differ diff --git a/public/fonts/fontawesome-webfont.eot b/public/fonts/fontawesome-webfont.eot deleted file mode 100644 index e9f60ca95..000000000 Binary files a/public/fonts/fontawesome-webfont.eot and /dev/null differ diff --git a/public/fonts/fontawesome-webfont.svg b/public/fonts/fontawesome-webfont.svg deleted file mode 100644 index 855c845e5..000000000 --- a/public/fonts/fontawesome-webfont.svg +++ /dev/null @@ -1,2671 +0,0 @@ - - - - -Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016 - By ,,, -Copyright Dave Gandy 2016. All rights reserved. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/public/fonts/fontawesome-webfont.ttf b/public/fonts/fontawesome-webfont.ttf deleted file mode 100644 index 35acda2fa..000000000 Binary files a/public/fonts/fontawesome-webfont.ttf and /dev/null differ diff --git a/public/fonts/fontawesome-webfont.woff b/public/fonts/fontawesome-webfont.woff deleted file mode 100644 index 400014a4b..000000000 Binary files a/public/fonts/fontawesome-webfont.woff and /dev/null differ diff --git a/public/fonts/fontawesome-webfont.woff2 b/public/fonts/fontawesome-webfont.woff2 deleted file mode 100644 index 4d13fc604..000000000 Binary files a/public/fonts/fontawesome-webfont.woff2 and /dev/null differ diff --git a/public/index.html b/public/index.html deleted file mode 100644 index 29f7efe33..000000000 --- a/public/index.html +++ /dev/null @@ -1,545 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

February, 2018

- -
-

2018-02-01

- -
    -
  • Peter gave feedback on the dc.rights proof of concept that I had sent him last week
  • -
  • We don’t need to distinguish between internal and external works, so that makes it just a simple list
  • -
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • -
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2018

- -
-

2018-01-02

- -
    -
  • Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
  • -
  • I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
  • -
  • The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
  • -
  • In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
  • -
  • And just before that I see this:
  • -
- -
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-
- -
    -
  • Ah hah! So the pool was actually empty!
  • -
  • I need to increase that, let’s try to bump it up from 50 to 75
  • -
  • After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
  • -
  • I notice this error quite a few times in dspace.log:
  • -
- -
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-
- -
    -
  • And there are many of these errors every day for the past month:
  • -
- -
$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-
- -
    -
  • Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2017

- -
-

2017-12-01

- -
    -
  • Uptime Robot noticed that CGSpace went down
  • -
  • The logs say “Timeout waiting for idle object”
  • -
  • PostgreSQL activity says there are 115 connections currently
  • -
  • The list of connections to XMLUI and REST API for today:
  • -
- -

- Read more → -
- - - - - - -
-
-

November, 2017

- -
-

2017-11-01

- -
    -
  • The CORE developers responded to say they are looking into their bot not respecting our robots.txt
  • -
- -

2017-11-02

- -
    -
  • Today there have been no hits by CORE and no alerts from Linode (coincidence?)
  • -
- -
# grep -c "CORE" /var/log/nginx/access.log
-0
-
- -
    -
  • Generate list of authors on CGSpace for Peter to go through and correct:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-
- -

- Read more → -
- - - - - - -
-
-

October, 2017

- -
-

2017-10-01

- - - -
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-
- -
    -
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • -
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • -
- -

- Read more → -
- - - - - - -
-
-

CGIAR Library Migration

- -
-

Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

- -

- Read more → -
- - - - - - -
-
-

September, 2017

- -
-

2017-09-06

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • -
- -

2017-09-07

- -
    -
  • Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
  • -
- -

- Read more → -
- - - - - - -
-
-

August, 2017

- -
-

2017-08-01

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
  • -
  • I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
  • -
  • The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session
  • -
  • This means our Tomcat Crawler Session Valve is working
  • -
  • But many of the bots are browsing dynamic URLs like: - -
      -
    • /handle/10568/3353/discover
    • -
    • /handle/10568/16510/browse
    • -
  • -
  • The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
  • -
  • Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
  • -
  • It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
  • -
  • Also, the bot has to successfully browse the page first so it can receive the HTTP header…
  • -
  • We might actually have to block these requests with HTTP 403 depending on the user agent
  • -
  • Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
  • -
  • This was due to newline characters in the dc.description.abstract column, which caused OpenRefine to choke when exporting the CSV
  • -
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • -
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • -
- -

- Read more → -
- - - - - - -
-
-

July, 2017

- -
-

2017-07-01

- -
    -
  • Run system updates and reboot DSpace Test
  • -
- -

2017-07-04

- -
    -
  • Merge changes for WLE Phase II theme rename (#329)
  • -
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • -
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • -
- -

- Read more → -
- - - - - - -
-
-

June, 2017

- -
- 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/index.xml b/public/index.xml deleted file mode 100644 index c3ecd6369..000000000 --- a/public/index.xml +++ /dev/null @@ -1,713 +0,0 @@ - - - - CGSpace Notes - https://alanorth.github.io/cgspace-notes/ - Recent content on CGSpace Notes - Hugo -- gohugo.io - en-us - Thu, 01 Feb 2018 16:28:54 +0200 - - - - - - February, 2018 - https://alanorth.github.io/cgspace-notes/2018-02/ - Thu, 01 Feb 2018 16:28:54 +0200 - - https://alanorth.github.io/cgspace-notes/2018-02/ - <h2 id="2018-02-01">2018-02-01</h2> - -<ul> -<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> -<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> -<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> -<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li> -</ul> - -<p></p> - - - - January, 2018 - https://alanorth.github.io/cgspace-notes/2018-01/ - Tue, 02 Jan 2018 08:35:54 -0800 - - https://alanorth.github.io/cgspace-notes/2018-01/ - <h2 id="2018-01-02">2018-01-02</h2> - -<ul> -<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> -<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> -<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> -<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> -<li>And just before that I see this:</li> -</ul> - -<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. -</code></pre> - -<ul> -<li>Ah hah! So the pool was actually empty!</li> -<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li> -<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li> -<li>I notice this error quite a few times in dspace.log:</li> -</ul> - -<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets -org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. -</code></pre> - -<ul> -<li>And there are many of these errors every day for the past month:</li> -</ul> - -<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* -dspace.log.2017-11-21:4 -dspace.log.2017-11-22:1 -dspace.log.2017-11-23:4 -dspace.log.2017-11-24:11 -dspace.log.2017-11-25:0 -dspace.log.2017-11-26:1 -dspace.log.2017-11-27:7 -dspace.log.2017-11-28:21 -dspace.log.2017-11-29:31 -dspace.log.2017-11-30:15 -dspace.log.2017-12-01:15 -dspace.log.2017-12-02:20 -dspace.log.2017-12-03:38 -dspace.log.2017-12-04:65 -dspace.log.2017-12-05:43 -dspace.log.2017-12-06:72 -dspace.log.2017-12-07:27 -dspace.log.2017-12-08:15 -dspace.log.2017-12-09:29 -dspace.log.2017-12-10:35 -dspace.log.2017-12-11:20 -dspace.log.2017-12-12:44 -dspace.log.2017-12-13:36 -dspace.log.2017-12-14:59 -dspace.log.2017-12-15:104 -dspace.log.2017-12-16:53 -dspace.log.2017-12-17:66 -dspace.log.2017-12-18:83 -dspace.log.2017-12-19:101 -dspace.log.2017-12-20:74 -dspace.log.2017-12-21:55 -dspace.log.2017-12-22:66 -dspace.log.2017-12-23:50 -dspace.log.2017-12-24:85 -dspace.log.2017-12-25:62 -dspace.log.2017-12-26:49 -dspace.log.2017-12-27:30 -dspace.log.2017-12-28:54 -dspace.log.2017-12-29:68 -dspace.log.2017-12-30:89 -dspace.log.2017-12-31:53 -dspace.log.2018-01-01:45 -dspace.log.2018-01-02:34 -</code></pre> - -<ul> -<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li> -</ul> - -<p></p> - - - - December, 2017 - https://alanorth.github.io/cgspace-notes/2017-12/ - Fri, 01 Dec 2017 13:53:54 +0300 - - https://alanorth.github.io/cgspace-notes/2017-12/ - <h2 id="2017-12-01">2017-12-01</h2> - -<ul> -<li>Uptime Robot noticed that CGSpace went down</li> -<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> -<li>PostgreSQL activity says there are 115 connections currently</li> -<li>The list of connections to XMLUI and REST API for today:</li> -</ul> - -<p></p> - - - - November, 2017 - https://alanorth.github.io/cgspace-notes/2017-11/ - Thu, 02 Nov 2017 09:37:54 +0200 - - https://alanorth.github.io/cgspace-notes/2017-11/ - <h2 id="2017-11-01">2017-11-01</h2> - -<ul> -<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> -</ul> - -<h2 id="2017-11-02">2017-11-02</h2> - -<ul> -<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li> -</ul> - -<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log -0 -</code></pre> - -<ul> -<li>Generate list of authors on CGSpace for Peter to go through and correct:</li> -</ul> - -<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; -COPY 54701 -</code></pre> - -<p></p> - - - - October, 2017 - https://alanorth.github.io/cgspace-notes/2017-10/ - Sun, 01 Oct 2017 08:07:54 +0300 - - https://alanorth.github.io/cgspace-notes/2017-10/ - <h2 id="2017-10-01">2017-10-01</h2> - -<ul> -<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li> -</ul> - -<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 -</code></pre> - -<ul> -<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li> -<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li> -</ul> - -<p></p> - - - - CGIAR Library Migration - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - Mon, 18 Sep 2017 16:38:35 +0300 - - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> - -<p></p> - - - - September, 2017 - https://alanorth.github.io/cgspace-notes/2017-09/ - Thu, 07 Sep 2017 16:54:52 +0700 - - https://alanorth.github.io/cgspace-notes/2017-09/ - <h2 id="2017-09-06">2017-09-06</h2> - -<ul> -<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> -</ul> - -<h2 id="2017-09-07">2017-09-07</h2> - -<ul> -<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> -</ul> - -<p></p> - - - - August, 2017 - https://alanorth.github.io/cgspace-notes/2017-08/ - Tue, 01 Aug 2017 11:51:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-08/ - <h2 id="2017-08-01">2017-08-01</h2> - -<ul> -<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> -<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> -<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> -<li>This means our Tomcat Crawler Session Valve is working</li> -<li>But many of the bots are browsing dynamic URLs like: - -<ul> -<li>/handle/10568/3353/discover</li> -<li>/handle/10568/16510/browse</li> -</ul></li> -<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> -<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> -<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> -<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> -<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> -<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> -<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li> -<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> -<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> -</ul> - -<p></p> - - - - July, 2017 - https://alanorth.github.io/cgspace-notes/2017-07/ - Sat, 01 Jul 2017 18:03:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-07/ - <h2 id="2017-07-01">2017-07-01</h2> - -<ul> -<li>Run system updates and reboot DSpace Test</li> -</ul> - -<h2 id="2017-07-04">2017-07-04</h2> - -<ul> -<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> -<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> -<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> -</ul> - -<p></p> - - - - June, 2017 - https://alanorth.github.io/cgspace-notes/2017-06/ - Thu, 01 Jun 2017 10:14:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-06/ - 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - - - - May, 2017 - https://alanorth.github.io/cgspace-notes/2017-05/ - Mon, 01 May 2017 16:21:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-05/ - 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - - - - April, 2017 - https://alanorth.github.io/cgspace-notes/2017-04/ - Sun, 02 Apr 2017 17:08:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-04/ - <h2 id="2017-04-02">2017-04-02</h2> - -<ul> -<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> -<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p> - -<ul> -<li>Remove redundant/duplicate text in the DSpace submission license</li> -<li>Testing the CMYK patch on a collection with 650 items:</li> -</ul> - -<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt -</code></pre> - -<p></p> - - - - March, 2017 - https://alanorth.github.io/cgspace-notes/2017-03/ - Wed, 01 Mar 2017 17:08:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-03/ - <h2 id="2017-03-01">2017-03-01</h2> - -<ul> -<li>Run the 279 CIAT author corrections on CGSpace</li> -</ul> - -<h2 id="2017-03-02">2017-03-02</h2> - -<ul> -<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> -<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> -<li>They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities</li> -<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li> -<li>Need to send Peter and Michael some notes about this in a few days</li> -<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> -<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> -<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> -<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</li> -</ul> - -<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg -/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 -</code></pre> - -<p></p> - - - - February, 2017 - https://alanorth.github.io/cgspace-notes/2017-02/ - Tue, 07 Feb 2017 07:04:52 -0800 - - https://alanorth.github.io/cgspace-notes/2017-02/ - <h2 id="2017-02-07">2017-02-07</h2> - -<ul> -<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li> -</ul> - -<pre><code>dspace=# select * from collection2item where item_id = '80278'; - id | collection_id | item_id --------+---------------+--------- - 92551 | 313 | 80278 - 92550 | 313 | 80278 - 90774 | 1051 | 80278 -(3 rows) -dspace=# delete from collection2item where id = 92551 and item_id = 80278; -DELETE 1 -</code></pre> - -<ul> -<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> -<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> -</ul> - -<p></p> - - - - January, 2017 - https://alanorth.github.io/cgspace-notes/2017-01/ - Mon, 02 Jan 2017 10:43:00 +0300 - - https://alanorth.github.io/cgspace-notes/2017-01/ - <h2 id="2017-01-02">2017-01-02</h2> - -<ul> -<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> -<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> -<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> -</ul> - -<p></p> - - - - December, 2016 - https://alanorth.github.io/cgspace-notes/2016-12/ - Fri, 02 Dec 2016 10:43:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-12/ - <h2 id="2016-12-02">2016-12-02</h2> - -<ul> -<li>CGSpace was down for five hours in the morning while I was sleeping</li> -<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li> -</ul> - -<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -</code></pre> - -<ul> -<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> -<li>I&rsquo;ve raised a ticket with Atmire to ask</li> -<li>Another worrying error from dspace.log is:</li> -</ul> - -<p></p> - - - - November, 2016 - https://alanorth.github.io/cgspace-notes/2016-11/ - Tue, 01 Nov 2016 09:21:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-11/ - <h2 id="2016-11-01">2016-11-01</h2> - -<ul> -<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> - -<p></p> - - - - October, 2016 - https://alanorth.github.io/cgspace-notes/2016-10/ - Mon, 03 Oct 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-10/ - <h2 id="2016-10-03">2016-10-03</h2> - -<ul> -<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> -<li>Need to test the following scenarios to see how author order is affected: - -<ul> -<li>ORCIDs only</li> -<li>ORCIDs plus normal authors</li> -</ul></li> -<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li> -</ul> - -<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X -</code></pre> - -<p></p> - - - - September, 2016 - https://alanorth.github.io/cgspace-notes/2016-09/ - Thu, 01 Sep 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-09/ - <h2 id="2016-09-01">2016-09-01</h2> - -<ul> -<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> -<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> -<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> -<li>It looks like we might be able to use OUs now, instead of DCs:</li> -</ul> - -<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; -</code></pre> - -<p></p> - - - - August, 2016 - https://alanorth.github.io/cgspace-notes/2016-08/ - Mon, 01 Aug 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-08/ - <h2 id="2016-08-01">2016-08-01</h2> - -<ul> -<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> -<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> -<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> -<li>bower stuff is a dead end, waste of time, too many issues</li> -<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> -<li>Start working on DSpace 5.1 → 5.5 port:</li> -</ul> - -<pre><code>$ git checkout -b 55new 5_x-prod -$ git reset --hard ilri/5_x-prod -$ git rebase -i dspace-5.5 -</code></pre> - -<p></p> - - - - July, 2016 - https://alanorth.github.io/cgspace-notes/2016-07/ - Fri, 01 Jul 2016 10:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-07/ - <h2 id="2016-07-01">2016-07-01</h2> - -<ul> -<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> -<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; -UPDATE 95 -dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; - text_value ------------- -(0 rows) -</code></pre> - -<ul> -<li>In this case the select query was showing 95 results before the update</li> -</ul> - -<p></p> - - - - June, 2016 - https://alanorth.github.io/cgspace-notes/2016-06/ - Wed, 01 Jun 2016 10:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-06/ - <h2 id="2016-06-01">2016-06-01</h2> - -<ul> -<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> -<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> -<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> -<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> -<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> -<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<p></p> - - - - May, 2016 - https://alanorth.github.io/cgspace-notes/2016-05/ - Sun, 01 May 2016 23:06:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-05/ - <h2 id="2016-05-01">2016-05-01</h2> - -<ul> -<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> -<li>I have blocked access to the API now</li> -<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li> -</ul> - -<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l -3168 -</code></pre> - -<p></p> - - - - April, 2016 - https://alanorth.github.io/cgspace-notes/2016-04/ - Mon, 04 Apr 2016 11:06:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-04/ - <h2 id="2016-04-04">2016-04-04</h2> - -<ul> -<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> -<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> -<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> -<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> -<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> -</ul> - -<p></p> - - - - March, 2016 - https://alanorth.github.io/cgspace-notes/2016-03/ - Wed, 02 Mar 2016 16:50:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-03/ - <h2 id="2016-03-02">2016-03-02</h2> - -<ul> -<li>Looking at issues with author authorities on CGSpace</li> -<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> -<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> -</ul> - -<p></p> - - - - February, 2016 - https://alanorth.github.io/cgspace-notes/2016-02/ - Fri, 05 Feb 2016 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-02/ - <h2 id="2016-02-05">2016-02-05</h2> - -<ul> -<li>Looking at some DAGRIS data for Abenet Yabowork</li> -<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> -<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p> - -<ul> -<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> -<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> -</ul> - -<p></p> - - - - January, 2016 - https://alanorth.github.io/cgspace-notes/2016-01/ - Wed, 13 Jan 2016 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-01/ - <h2 id="2016-01-13">2016-01-13</h2> - -<ul> -<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> -<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> -<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> -</ul> - -<p></p> - - - - December, 2015 - https://alanorth.github.io/cgspace-notes/2015-12/ - Wed, 02 Dec 2015 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2015-12/ - <h2 id="2015-12-02">2015-12-02</h2> - -<ul> -<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li> -</ul> - -<pre><code># cd /home/dspacetest.cgiar.org/log -# ls -lh dspace.log.2015-11-18* --rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 --rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo --rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -</code></pre> - -<p></p> - - - - November, 2015 - https://alanorth.github.io/cgspace-notes/2015-11/ - Mon, 23 Nov 2015 17:00:57 +0300 - - https://alanorth.github.io/cgspace-notes/2015-11/ - <h2 id="2015-11-22">2015-11-22</h2> - -<ul> -<li>CGSpace went down</li> -<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> -<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li> -</ul> - -<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace -78 -</code></pre> - -<p></p> - - - - \ No newline at end of file diff --git a/public/js/cookieconsent.min.js b/public/js/cookieconsent.min.js deleted file mode 100644 index 8e44bdde9..000000000 --- a/public/js/cookieconsent.min.js +++ /dev/null @@ -1 +0,0 @@ -!function(e){if(!e.hasInitialised){var t={escapeRegExp:function(e){return e.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g,"\\$&")},hasClass:function(e,t){var i=" ";return 1===e.nodeType&&(i+e.className+i).replace(/[\n\t]/g,i).indexOf(i+t+i)>=0},addClass:function(e,t){e.className+=" "+t},removeClass:function(e,t){var i=new RegExp("\\b"+this.escapeRegExp(t)+"\\b");e.className=e.className.replace(i,"")},interpolateString:function(e,t){var i=/{{([a-z][a-z0-9\-_]*)}}/gi;return e.replace(i,function(e){return t(arguments[1])||""})},getCookie:function(e){var t="; "+document.cookie,i=t.split("; "+e+"=");return 2!=i.length?void 0:i.pop().split(";").shift()},setCookie:function(e,t,i,n,o){var s=new Date;s.setDate(s.getDate()+(i||365));var r=[e+"="+t,"expires="+s.toUTCString(),"path="+(o||"/")];n&&r.push("domain="+n),document.cookie=r.join(";")},deepExtend:function(e,t){for(var i in t)t.hasOwnProperty(i)&&(i in e&&this.isPlainObject(e[i])&&this.isPlainObject(t[i])?this.deepExtend(e[i],t[i]):e[i]=t[i]);return e},throttle:function(e,t){var i=!1;return function(){i||(e.apply(this,arguments),i=!0,setTimeout(function(){i=!1},t))}},hash:function(e){var t,i,n,o=0;if(0===e.length)return o;for(t=0,n=e.length;t=128?"#000":"#fff"},getLuminance:function(e){var t=parseInt(this.normaliseHex(e),16),i=38,n=(t>>16)+i,o=(t>>8&255)+i,s=(255&t)+i,r=(16777216+65536*(n<255?n<1?0:n:255)+256*(o<255?o<1?0:o:255)+(s<255?s<1?0:s:255)).toString(16).slice(1);return"#"+r},isMobile:function(){return/Android|webOS|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent)},isPlainObject:function(e){return"object"==typeof e&&null!==e&&e.constructor==Object}};e.status={deny:"deny",allow:"allow",dismiss:"dismiss"},e.transitionEnd=function(){var e=document.createElement("div"),t={t:"transitionend",OT:"oTransitionEnd",msT:"MSTransitionEnd",MozT:"transitionend",WebkitT:"webkitTransitionEnd"};for(var i in t)if(t.hasOwnProperty(i)&&"undefined"!=typeof e.style[i+"ransition"])return t[i];return""}(),e.hasTransition=!!e.transitionEnd;var i=Object.keys(e.status).map(t.escapeRegExp);e.customStyles={},e.Popup=function(){function n(){this.initialise.apply(this,arguments)}function o(e){this.openingTimeout=null,t.removeClass(e,"cc-invisible")}function s(t){t.style.display="none",t.removeEventListener(e.transitionEnd,this.afterTransition),this.afterTransition=null}function r(){var t=this.options.onInitialise.bind(this);if(!window.navigator.cookieEnabled)return t(e.status.deny),!0;if(window.CookiesOK||window.navigator.CookiesOK)return t(e.status.allow),!0;var i=Object.keys(e.status),n=this.getStatus(),o=i.indexOf(n)>=0;return o&&t(n),o}function a(){var e=this.options.position.split("-"),t=[];return e.forEach(function(e){t.push("cc-"+e)}),t}function c(){var e=this.options,i="top"==e.position||"bottom"==e.position?"banner":"floating";t.isMobile()&&(i="floating");var n=["cc-"+i,"cc-type-"+e.type,"cc-theme-"+e.theme];e["static"]&&n.push("cc-static"),n.push.apply(n,a.call(this));p.call(this,this.options.palette);return this.customStyleSelector&&n.push(this.customStyleSelector),n}function l(){var e={},i=this.options;i.showLink||(i.elements.link="",i.elements.messagelink=i.elements.message),Object.keys(i.elements).forEach(function(n){e[n]=t.interpolateString(i.elements[n],function(e){var t=i.content[e];return e&&"string"==typeof t&&t.length?t:""})});var n=i.compliance[i.type];n||(n=i.compliance.info),e.compliance=t.interpolateString(n,function(t){return e[t]});var o=i.layouts[i.layout];return o||(o=i.layouts.basic),t.interpolateString(o,function(t){return e[t]})}function u(i){var n=this.options,o=document.createElement("div"),s=n.container&&1===n.container.nodeType?n.container:document.body;o.innerHTML=i;var r=o.children[0];return r.style.display="none",t.hasClass(r,"cc-window")&&e.hasTransition&&t.addClass(r,"cc-invisible"),this.onButtonClick=h.bind(this),r.addEventListener("click",this.onButtonClick),n.autoAttach&&(s.firstChild?s.insertBefore(r,s.firstChild):s.appendChild(r)),r}function h(n){var o=n.target;if(t.hasClass(o,"cc-btn")){var s=o.className.match(new RegExp("\\bcc-("+i.join("|")+")\\b")),r=s&&s[1]||!1;r&&(this.setStatus(r),this.close(!0))}t.hasClass(o,"cc-close")&&(this.setStatus(e.status.dismiss),this.close(!0)),t.hasClass(o,"cc-revoke")&&this.revokeChoice()}function p(e){var i=t.hash(JSON.stringify(e)),n="cc-color-override-"+i,o=t.isPlainObject(e);return this.customStyleSelector=o?n:null,o&&d(i,e,"."+n),o}function d(i,n,o){if(e.customStyles[i])return void++e.customStyles[i].references;var s={},r=n.popup,a=n.button,c=n.highlight;r&&(r.text=r.text?r.text:t.getContrast(r.background),r.link=r.link?r.link:r.text,s[o+".cc-window"]=["color: "+r.text,"background-color: "+r.background],s[o+".cc-revoke"]=["color: "+r.text,"background-color: "+r.background],s[o+" .cc-link,"+o+" .cc-link:active,"+o+" .cc-link:visited"]=["color: "+r.link],a&&(a.text=a.text?a.text:t.getContrast(a.background),a.border=a.border?a.border:"transparent",s[o+" .cc-btn"]=["color: "+a.text,"border-color: "+a.border,"background-color: "+a.background],"transparent"!=a.background&&(s[o+" .cc-btn:hover, "+o+" .cc-btn:focus"]=["background-color: "+v(a.background)]),c?(c.text=c.text?c.text:t.getContrast(c.background),c.border=c.border?c.border:"transparent",s[o+" .cc-highlight .cc-btn:first-child"]=["color: "+c.text,"border-color: "+c.border,"background-color: "+c.background]):s[o+" .cc-highlight .cc-btn:first-child"]=["color: "+r.text]));var l=document.createElement("style");document.head.appendChild(l),e.customStyles[i]={references:1,element:l.sheet};var u=-1;for(var h in s)s.hasOwnProperty(h)&&l.sheet.insertRule(h+"{"+s[h].join(";")+"}",++u)}function v(e){return e=t.normaliseHex(e),"000000"==e?"#222":t.getLuminance(e)}function f(i){if(t.isPlainObject(i)){var n=t.hash(JSON.stringify(i)),o=e.customStyles[n];if(o&&!--o.references){var s=o.element.ownerNode;s&&s.parentNode&&s.parentNode.removeChild(s),e.customStyles[n]=null}}}function m(e,t){for(var i=0,n=e.length;i=0&&(this.dismissTimeout=window.setTimeout(function(){t(e.status.dismiss)},Math.floor(i)));var n=this.options.dismissOnScroll;if("number"==typeof n&&n>=0){var o=function(i){window.pageYOffset>Math.floor(n)&&(t(e.status.dismiss),window.removeEventListener("scroll",o),this.onWindowScroll=null)};this.onWindowScroll=o,window.addEventListener("scroll",o)}}function y(){if("info"!=this.options.type&&(this.options.revokable=!0),t.isMobile()&&(this.options.animateRevokable=!1),this.options.revokable){var e=a.call(this);this.options.animateRevokable&&e.push("cc-animate"),this.customStyleSelector&&e.push(this.customStyleSelector);var i=this.options.revokeBtn.replace("{{classes}}",e.join(" "));this.revokeBtn=u.call(this,i);var n=this.revokeBtn;if(this.options.animateRevokable){var o=t.throttle(function(e){var i=!1,o=20,s=window.innerHeight-20;t.hasClass(n,"cc-top")&&e.clientYs&&(i=!0),i?t.hasClass(n,"cc-active")||t.addClass(n,"cc-active"):t.hasClass(n,"cc-active")&&t.removeClass(n,"cc-active")},200);this.onMouseMove=o,window.addEventListener("mousemove",o)}}}var g={enabled:!0,container:null,cookie:{name:"cookieconsent_status",path:"/",domain:"",expiryDays:365},onPopupOpen:function(){},onPopupClose:function(){},onInitialise:function(e){},onStatusChange:function(e,t){},onRevokeChoice:function(){},content:{header:"Cookies used on the website!",message:"This website uses cookies to ensure you get the best experience on our website.",dismiss:"Got it!",allow:"Allow cookies",deny:"Decline",link:"Learn more",href:"http://cookiesandyou.com",close:"❌"},elements:{header:'{{header}} ',message:'{{message}}',messagelink:'{{message}} {{link}}',dismiss:'{{dismiss}}',allow:'{{allow}}',deny:'{{deny}}',link:'{{link}}',close:'{{close}}'},window:'',revokeBtn:'
Cookie Policy
',compliance:{info:'
{{dismiss}}
',"opt-in":'
{{dismiss}}{{allow}}
',"opt-out":'
{{deny}}{{dismiss}}
'},type:"info",layouts:{basic:"{{messagelink}}{{compliance}}","basic-close":"{{messagelink}}{{compliance}}{{close}}","basic-header":"{{header}}{{message}}{{link}}{{compliance}}"},layout:"basic",position:"bottom",theme:"block","static":!1,palette:null,revokable:!1,animateRevokable:!0,showLink:!0,dismissOnScroll:!1,dismissOnTimeout:!1,autoOpen:!0,autoAttach:!0,whitelistPage:[],blacklistPage:[],overrideHTML:null};return n.prototype.initialise=function(e){this.options&&this.destroy(),t.deepExtend(this.options={},g),t.isPlainObject(e)&&t.deepExtend(this.options,e),r.call(this)&&(this.options.enabled=!1),m(this.options.blacklistPage,location.pathname)&&(this.options.enabled=!1),m(this.options.whitelistPage,location.pathname)&&(this.options.enabled=!0);var i=this.options.window.replace("{{classes}}",c.call(this).join(" ")).replace("{{children}}",l.call(this)),n=this.options.overrideHTML;if("string"==typeof n&&n.length&&(i=n),this.options["static"]){var o=u.call(this,'
'+i+"
");o.style.display="",this.element=o.firstChild,this.element.style.display="none",t.addClass(this.element,"cc-invisible")}else this.element=u.call(this,i);b.call(this),y.call(this),this.options.autoOpen&&this.autoOpen()},n.prototype.destroy=function(){this.onButtonClick&&this.element&&(this.element.removeEventListener("click",this.onButtonClick),this.onButtonClick=null),this.dismissTimeout&&(clearTimeout(this.dismissTimeout),this.dismissTimeout=null),this.onWindowScroll&&(window.removeEventListener("scroll",this.onWindowScroll),this.onWindowScroll=null),this.onMouseMove&&(window.removeEventListener("mousemove",this.onMouseMove),this.onMouseMove=null),this.element&&this.element.parentNode&&this.element.parentNode.removeChild(this.element),this.element=null,this.revokeBtn&&this.revokeBtn.parentNode&&this.revokeBtn.parentNode.removeChild(this.revokeBtn),this.revokeBtn=null,f(this.options.palette),this.options=null},n.prototype.open=function(t){if(this.element)return this.isOpen()||(e.hasTransition?this.fadeIn():this.element.style.display="",this.options.revokable&&this.toggleRevokeButton(),this.options.onPopupOpen.call(this)),this},n.prototype.close=function(t){if(this.element)return this.isOpen()&&(e.hasTransition?this.fadeOut():this.element.style.display="none",t&&this.options.revokable&&this.toggleRevokeButton(!0),this.options.onPopupClose.call(this)),this},n.prototype.fadeIn=function(){var i=this.element;if(e.hasTransition&&i&&(this.afterTransition&&s.call(this,i),t.hasClass(i,"cc-invisible"))){if(i.style.display="",this.options["static"]){var n=this.element.clientHeight;this.element.parentNode.style.maxHeight=n+"px"}var r=20;this.openingTimeout=setTimeout(o.bind(this,i),r)}},n.prototype.fadeOut=function(){var i=this.element;e.hasTransition&&i&&(this.openingTimeout&&(clearTimeout(this.openingTimeout),o.bind(this,i)),t.hasClass(i,"cc-invisible")||(this.options["static"]&&(this.element.parentNode.style.maxHeight=""),this.afterTransition=s.bind(this,i),i.addEventListener(e.transitionEnd,this.afterTransition),t.addClass(i,"cc-invisible")))},n.prototype.isOpen=function(){return this.element&&""==this.element.style.display&&(!e.hasTransition||!t.hasClass(this.element,"cc-invisible"))},n.prototype.toggleRevokeButton=function(e){this.revokeBtn&&(this.revokeBtn.style.display=e?"":"none")},n.prototype.revokeChoice=function(e){this.options.enabled=!0,this.clearStatus(),this.options.onRevokeChoice.call(this),e||this.autoOpen()},n.prototype.hasAnswered=function(t){return Object.keys(e.status).indexOf(this.getStatus())>=0},n.prototype.hasConsented=function(t){var i=this.getStatus();return i==e.status.allow||i==e.status.dismiss},n.prototype.autoOpen=function(e){!this.hasAnswered()&&this.options.enabled&&this.open()},n.prototype.setStatus=function(i){var n=this.options.cookie,o=t.getCookie(n.name),s=Object.keys(e.status).indexOf(o)>=0;Object.keys(e.status).indexOf(i)>=0?(t.setCookie(n.name,i,n.expiryDays,n.domain,n.path),this.options.onStatusChange.call(this,i,s)):this.clearStatus()},n.prototype.getStatus=function(){return t.getCookie(this.options.cookie.name)},n.prototype.clearStatus=function(){var e=this.options.cookie;t.setCookie(e.name,"",-1,e.domain,e.path)},n}(),e.Location=function(){function e(e){t.deepExtend(this.options={},s),t.isPlainObject(e)&&t.deepExtend(this.options,e),this.currentServiceIndex=-1}function i(e,t,i){var n,o=document.createElement("script");o.type="text/"+(e.type||"javascript"),o.src=e.src||e,o.async=!1,o.onreadystatechange=o.onload=function(){var e=o.readyState;clearTimeout(n),t.done||e&&!/loaded|complete/.test(e)||(t.done=!0,t(),o.onreadystatechange=o.onload=null)},document.body.appendChild(o),n=setTimeout(function(){t.done=!0,t(),o.onreadystatechange=o.onload=null},i)}function n(e,t,i,n,o){var s=new(window.XMLHttpRequest||window.ActiveXObject)("MSXML2.XMLHTTP.3.0");if(s.open(n?"POST":"GET",e,1),s.setRequestHeader("X-Requested-With","XMLHttpRequest"),s.setRequestHeader("Content-type","application/x-www-form-urlencoded"),Array.isArray(o))for(var r=0,a=o.length;r3&&t(s)}),s.send(n)}function o(e){return new Error("Error ["+(e.code||"UNKNOWN")+"]: "+e.error)}var s={timeout:5e3,services:["freegeoip","ipinfo","maxmind"],serviceDefinitions:{freegeoip:function(){return{url:"//freegeoip.net/json/?callback={callback}",isScript:!0,callback:function(e,t){try{var i=JSON.parse(t);return i.error?o(i):{code:i.country_code}}catch(n){return o({error:"Invalid response ("+n+")"})}}}},ipinfo:function(){return{url:"//ipinfo.io",headers:["Accept: application/json"],callback:function(e,t){try{var i=JSON.parse(t);return i.error?o(i):{code:i.country}}catch(n){return o({error:"Invalid response ("+n+")"})}}}},ipinfodb:function(e){return{url:"//api.ipinfodb.com/v3/ip-country/?key={api_key}&format=json&callback={callback}",isScript:!0,callback:function(e,t){try{var i=JSON.parse(t);return"ERROR"==i.statusCode?o({error:i.statusMessage}):{code:i.countryCode}}catch(n){return o({error:"Invalid response ("+n+")"})}}}},maxmind:function(){return{url:"//js.maxmind.com/js/apis/geoip2/v2.1/geoip2.js",isScript:!0,callback:function(e){return window.geoip2?void geoip2.country(function(t){try{e({code:t.country.iso_code})}catch(i){e(o(i))}},function(t){e(o(t))}):void e(new Error("Unexpected response format. The downloaded script should have exported `geoip2` to the global scope"))}}}}};return e.prototype.getNextService=function(){var e;do e=this.getServiceByIdx(++this.currentServiceIndex);while(this.currentServiceIndex=0,revokable:t.revokable.indexOf(e)>=0,explicitAction:t.explicitAction.indexOf(e)>=0}},e.prototype.applyLaw=function(e,t){var i=this.get(t);return i.hasLaw||(e.enabled=!1),this.options.regionalLaw&&(i.revokable&&(e.revokable=!0),i.explicitAction&&(e.dismissOnScroll=!1,e.dismissOnTimeout=!1)),e},e}(),e.initialise=function(t,i,n){var o=new e.Law(t.law);i||(i=function(){}),n||(n=function(){}),e.getCountryCode(t,function(n){delete t.law,delete t.location,n.code&&(t=o.applyLaw(t,n.code)),i(new e.Popup(t))},function(i){delete t.law,delete t.location,n(i,new e.Popup(t))})},e.getCountryCode=function(t,i,n){if(t.law&&t.law.countryCode)return void i({code:t.law.countryCode});if(t.location){var o=new e.Location(t.location);return void o.locate(function(e){i(e||{})},n)}i({})},e.utils=t,e.hasInitialised=!0,window.cookieconsent=e}}(window.cookieconsent||{}); \ No newline at end of file diff --git a/public/page/1/index.html b/public/page/1/index.html deleted file mode 100644 index a5add3664..000000000 --- a/public/page/1/index.html +++ /dev/null @@ -1 +0,0 @@ -https://alanorth.github.io/cgspace-notes/ \ No newline at end of file diff --git a/public/page/2/index.html b/public/page/2/index.html deleted file mode 100644 index 5692e06d6..000000000 --- a/public/page/2/index.html +++ /dev/null @@ -1,502 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

May, 2017

- -
- 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - Read more → -
- - - - - - -
-
-

April, 2017

- -
-

2017-04-02

- -
    -
  • Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): https://github.com/ilri/DSpace/pull/317
  • -
  • Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
  • -
- -

dc.rights in the submission form

- -
    -
  • Remove redundant/duplicate text in the DSpace submission license
  • -
  • Testing the CMYK patch on a collection with 650 items:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
- -

- Read more → -
- - - - - - -
-
-

March, 2017

- -
-

2017-03-01

- -
    -
  • Run the 279 CIAT author corrections on CGSpace
  • -
- -

2017-03-02

- -
    -
  • Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
  • -
  • CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
  • -
  • They might come in at the top level in one “CGIAR System” community, or with several communities
  • -
  • I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
  • -
  • Need to send Peter and Michael some notes about this in a few days
  • -
  • Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
  • -
  • Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
  • -
  • Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
  • -
  • Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 1056851999):
  • -
- -
$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-
- -

- Read more → -
- - - - - - -
-
-

February, 2017

- -
-

2017-02-07

- -
    -
  • An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
  • -
- -
dspace=# select * from collection2item where item_id = '80278';
-  id   | collection_id | item_id
--------+---------------+---------
- 92551 |           313 |   80278
- 92550 |           313 |   80278
- 90774 |          1051 |   80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-
- -
    -
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • -
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2017

- -
-

2017-01-02

- -
    -
  • I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
  • -
  • I tested on DSpace Test as well and it doesn’t work there either
  • -
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2016

- -
-

2016-12-02

- -
    -
  • CGSpace was down for five hours in the morning while I was sleeping
  • -
  • While looking in the logs for errors, I see tons of warnings about Atmire MQM:
  • -
- -
2016-12-02 03:00:32,352 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-
- -
    -
  • I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
  • -
  • I’ve raised a ticket with Atmire to ask
  • -
  • Another worrying error from dspace.log is:
  • -
- -

- Read more → -
- - - - - - -
-
-

November, 2016

- -
-

2016-11-01

- -
    -
  • Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
  • -
- -

Listings and Reports with output type

- -

- Read more → -
- - - - - - -
-
-

October, 2016

- -
-

2016-10-03

- -
    -
  • Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
  • -
  • Need to test the following scenarios to see how author order is affected: - -
      -
    • ORCIDs only
    • -
    • ORCIDs plus normal authors
    • -
  • -
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • -
- -
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-
- -

- Read more → -
- - - - - - -
-
-

September, 2016

- -
-

2016-09-01

- -
    -
  • Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
  • -
  • Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
  • -
  • We had been using DC=ILRI to determine whether a user was ILRI or not
  • -
  • It looks like we might be able to use OUs now, instead of DCs:
  • -
- -
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-
- -

- Read more → -
- - - - - - -
-
-

August, 2016

- -
-

2016-08-01

- -
    -
  • Add updated distribution license from Sisay (#259)
  • -
  • Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
  • -
  • Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
  • -
  • bower stuff is a dead end, waste of time, too many issues
  • -
  • Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
  • -
  • Start working on DSpace 5.1 → 5.5 port:
  • -
- -
$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-
- -

- Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/page/3/index.html b/public/page/3/index.html deleted file mode 100644 index 64a2b1f04..000000000 --- a/public/page/3/index.html +++ /dev/null @@ -1,449 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

July, 2016

- -
-

2016-07-01

- -
    -
  • Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
  • -
  • I think this query should find and replace all authors that have “,” at the end of their names:
  • -
- -
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-
- -
    -
  • In this case the select query was showing 95 results before the update
  • -
- -

- Read more → -
- - - - - - -
-
-

June, 2016

- -
-

2016-06-01

- - - -

- Read more → -
- - - - - - -
-
-

May, 2016

- -
-

2016-05-01

- -
    -
  • Since yesterday there have been 10,000 REST errors and the site has been unstable again
  • -
  • I have blocked access to the API now
  • -
  • There are 3,000 IPs accessing the REST API in a 24-hour period!
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
-3168
-
- -

- Read more → -
- - - - - - -
-
-

April, 2016

- -
-

2016-04-04

- -
    -
  • Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
  • -
  • We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
  • -
  • After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
  • -
  • This will save us a few gigs of backup space we’re paying for on S3
  • -
  • Also, I noticed the checker log has some errors we should pay attention to:
  • -
- -

- Read more → -
- - - - - - -
-
-

March, 2016

- -
-

2016-03-02

- -
    -
  • Looking at issues with author authorities on CGSpace
  • -
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • -
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • -
- -

- Read more → -
- - - - - - -
-
-

February, 2016

- -
-

2016-02-05

- -
    -
  • Looking at some DAGRIS data for Abenet Yabowork
  • -
  • Lots of issues with spaces, newlines, etc causing the import to fail
  • -
  • I noticed we have a very interesting list of countries on CGSpace:
  • -
- -

CGSpace country list

- -
    -
  • Not only are there 49,000 countries, we have some blanks (25)…
  • -
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2016

- -
-

2016-01-13

- -
    -
  • Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
  • -
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • -
  • Update GitHub wiki for documentation of maintenance tasks.
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2015

- -
-

2015-12-02

- -
    -
  • Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
  • -
- -
# cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-
- -

- Read more → -
- - - - - - -
-
-

November, 2015

- -
-

2015-11-22

- -
    -
  • CGSpace went down
  • -
  • Looks like DSpace exhausted its PostgreSQL connection pool
  • -
  • Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-
- -

- Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/post/index.html b/public/post/index.html deleted file mode 100644 index 2351b147f..000000000 --- a/public/post/index.html +++ /dev/null @@ -1,545 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

February, 2018

- -
-

2018-02-01

- -
    -
  • Peter gave feedback on the dc.rights proof of concept that I had sent him last week
  • -
  • We don’t need to distinguish between internal and external works, so that makes it just a simple list
  • -
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • -
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2018

- -
-

2018-01-02

- -
    -
  • Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
  • -
  • I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
  • -
  • The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
  • -
  • In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
  • -
  • And just before that I see this:
  • -
- -
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-
- -
    -
  • Ah hah! So the pool was actually empty!
  • -
  • I need to increase that, let’s try to bump it up from 50 to 75
  • -
  • After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
  • -
  • I notice this error quite a few times in dspace.log:
  • -
- -
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-
- -
    -
  • And there are many of these errors every day for the past month:
  • -
- -
$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-
- -
    -
  • Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2017

- -
-

2017-12-01

- -
    -
  • Uptime Robot noticed that CGSpace went down
  • -
  • The logs say “Timeout waiting for idle object”
  • -
  • PostgreSQL activity says there are 115 connections currently
  • -
  • The list of connections to XMLUI and REST API for today:
  • -
- -

- Read more → -
- - - - - - -
-
-

November, 2017

- -
-

2017-11-01

- -
    -
  • The CORE developers responded to say they are looking into their bot not respecting our robots.txt
  • -
- -

2017-11-02

- -
    -
  • Today there have been no hits by CORE and no alerts from Linode (coincidence?)
  • -
- -
# grep -c "CORE" /var/log/nginx/access.log
-0
-
- -
    -
  • Generate list of authors on CGSpace for Peter to go through and correct:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-
- -

- Read more → -
- - - - - - -
-
-

October, 2017

- -
-

2017-10-01

- - - -
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-
- -
    -
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • -
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • -
- -

- Read more → -
- - - - - - -
-
-

CGIAR Library Migration

- -
-

Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

- -

- Read more → -
- - - - - - -
-
-

September, 2017

- -
-

2017-09-06

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • -
- -

2017-09-07

- -
    -
  • Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
  • -
- -

- Read more → -
- - - - - - -
-
-

August, 2017

- -
-

2017-08-01

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
  • -
  • I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
  • -
  • The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session
  • -
  • This means our Tomcat Crawler Session Valve is working
  • -
  • But many of the bots are browsing dynamic URLs like: - -
      -
    • /handle/10568/3353/discover
    • -
    • /handle/10568/16510/browse
    • -
  • -
  • The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
  • -
  • Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
  • -
  • It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
  • -
  • Also, the bot has to successfully browse the page first so it can receive the HTTP header…
  • -
  • We might actually have to block these requests with HTTP 403 depending on the user agent
  • -
  • Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
  • -
  • This was due to newline characters in the dc.description.abstract column, which caused OpenRefine to choke when exporting the CSV
  • -
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • -
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • -
- -

- Read more → -
- - - - - - -
-
-

July, 2017

- -
-

2017-07-01

- -
    -
  • Run system updates and reboot DSpace Test
  • -
- -

2017-07-04

- -
    -
  • Merge changes for WLE Phase II theme rename (#329)
  • -
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • -
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • -
- -

- Read more → -
- - - - - - -
-
-

June, 2017

- -
- 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/post/index.xml b/public/post/index.xml deleted file mode 100644 index 9d22c33c6..000000000 --- a/public/post/index.xml +++ /dev/null @@ -1,713 +0,0 @@ - - - - Posts on CGSpace Notes - https://alanorth.github.io/cgspace-notes/post/ - Recent content in Posts on CGSpace Notes - Hugo -- gohugo.io - en-us - Thu, 01 Feb 2018 16:28:54 +0200 - - - - - - February, 2018 - https://alanorth.github.io/cgspace-notes/2018-02/ - Thu, 01 Feb 2018 16:28:54 +0200 - - https://alanorth.github.io/cgspace-notes/2018-02/ - <h2 id="2018-02-01">2018-02-01</h2> - -<ul> -<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> -<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> -<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> -<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li> -</ul> - -<p></p> - - - - January, 2018 - https://alanorth.github.io/cgspace-notes/2018-01/ - Tue, 02 Jan 2018 08:35:54 -0800 - - https://alanorth.github.io/cgspace-notes/2018-01/ - <h2 id="2018-01-02">2018-01-02</h2> - -<ul> -<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> -<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> -<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> -<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> -<li>And just before that I see this:</li> -</ul> - -<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. -</code></pre> - -<ul> -<li>Ah hah! So the pool was actually empty!</li> -<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li> -<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li> -<li>I notice this error quite a few times in dspace.log:</li> -</ul> - -<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets -org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. -</code></pre> - -<ul> -<li>And there are many of these errors every day for the past month:</li> -</ul> - -<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* -dspace.log.2017-11-21:4 -dspace.log.2017-11-22:1 -dspace.log.2017-11-23:4 -dspace.log.2017-11-24:11 -dspace.log.2017-11-25:0 -dspace.log.2017-11-26:1 -dspace.log.2017-11-27:7 -dspace.log.2017-11-28:21 -dspace.log.2017-11-29:31 -dspace.log.2017-11-30:15 -dspace.log.2017-12-01:15 -dspace.log.2017-12-02:20 -dspace.log.2017-12-03:38 -dspace.log.2017-12-04:65 -dspace.log.2017-12-05:43 -dspace.log.2017-12-06:72 -dspace.log.2017-12-07:27 -dspace.log.2017-12-08:15 -dspace.log.2017-12-09:29 -dspace.log.2017-12-10:35 -dspace.log.2017-12-11:20 -dspace.log.2017-12-12:44 -dspace.log.2017-12-13:36 -dspace.log.2017-12-14:59 -dspace.log.2017-12-15:104 -dspace.log.2017-12-16:53 -dspace.log.2017-12-17:66 -dspace.log.2017-12-18:83 -dspace.log.2017-12-19:101 -dspace.log.2017-12-20:74 -dspace.log.2017-12-21:55 -dspace.log.2017-12-22:66 -dspace.log.2017-12-23:50 -dspace.log.2017-12-24:85 -dspace.log.2017-12-25:62 -dspace.log.2017-12-26:49 -dspace.log.2017-12-27:30 -dspace.log.2017-12-28:54 -dspace.log.2017-12-29:68 -dspace.log.2017-12-30:89 -dspace.log.2017-12-31:53 -dspace.log.2018-01-01:45 -dspace.log.2018-01-02:34 -</code></pre> - -<ul> -<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li> -</ul> - -<p></p> - - - - December, 2017 - https://alanorth.github.io/cgspace-notes/2017-12/ - Fri, 01 Dec 2017 13:53:54 +0300 - - https://alanorth.github.io/cgspace-notes/2017-12/ - <h2 id="2017-12-01">2017-12-01</h2> - -<ul> -<li>Uptime Robot noticed that CGSpace went down</li> -<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> -<li>PostgreSQL activity says there are 115 connections currently</li> -<li>The list of connections to XMLUI and REST API for today:</li> -</ul> - -<p></p> - - - - November, 2017 - https://alanorth.github.io/cgspace-notes/2017-11/ - Thu, 02 Nov 2017 09:37:54 +0200 - - https://alanorth.github.io/cgspace-notes/2017-11/ - <h2 id="2017-11-01">2017-11-01</h2> - -<ul> -<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> -</ul> - -<h2 id="2017-11-02">2017-11-02</h2> - -<ul> -<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li> -</ul> - -<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log -0 -</code></pre> - -<ul> -<li>Generate list of authors on CGSpace for Peter to go through and correct:</li> -</ul> - -<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; -COPY 54701 -</code></pre> - -<p></p> - - - - October, 2017 - https://alanorth.github.io/cgspace-notes/2017-10/ - Sun, 01 Oct 2017 08:07:54 +0300 - - https://alanorth.github.io/cgspace-notes/2017-10/ - <h2 id="2017-10-01">2017-10-01</h2> - -<ul> -<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li> -</ul> - -<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 -</code></pre> - -<ul> -<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li> -<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li> -</ul> - -<p></p> - - - - CGIAR Library Migration - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - Mon, 18 Sep 2017 16:38:35 +0300 - - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> - -<p></p> - - - - September, 2017 - https://alanorth.github.io/cgspace-notes/2017-09/ - Thu, 07 Sep 2017 16:54:52 +0700 - - https://alanorth.github.io/cgspace-notes/2017-09/ - <h2 id="2017-09-06">2017-09-06</h2> - -<ul> -<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> -</ul> - -<h2 id="2017-09-07">2017-09-07</h2> - -<ul> -<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> -</ul> - -<p></p> - - - - August, 2017 - https://alanorth.github.io/cgspace-notes/2017-08/ - Tue, 01 Aug 2017 11:51:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-08/ - <h2 id="2017-08-01">2017-08-01</h2> - -<ul> -<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> -<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> -<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> -<li>This means our Tomcat Crawler Session Valve is working</li> -<li>But many of the bots are browsing dynamic URLs like: - -<ul> -<li>/handle/10568/3353/discover</li> -<li>/handle/10568/16510/browse</li> -</ul></li> -<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> -<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> -<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> -<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> -<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> -<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> -<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li> -<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> -<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> -</ul> - -<p></p> - - - - July, 2017 - https://alanorth.github.io/cgspace-notes/2017-07/ - Sat, 01 Jul 2017 18:03:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-07/ - <h2 id="2017-07-01">2017-07-01</h2> - -<ul> -<li>Run system updates and reboot DSpace Test</li> -</ul> - -<h2 id="2017-07-04">2017-07-04</h2> - -<ul> -<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> -<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> -<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> -</ul> - -<p></p> - - - - June, 2017 - https://alanorth.github.io/cgspace-notes/2017-06/ - Thu, 01 Jun 2017 10:14:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-06/ - 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - - - - May, 2017 - https://alanorth.github.io/cgspace-notes/2017-05/ - Mon, 01 May 2017 16:21:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-05/ - 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - - - - April, 2017 - https://alanorth.github.io/cgspace-notes/2017-04/ - Sun, 02 Apr 2017 17:08:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-04/ - <h2 id="2017-04-02">2017-04-02</h2> - -<ul> -<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> -<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p> - -<ul> -<li>Remove redundant/duplicate text in the DSpace submission license</li> -<li>Testing the CMYK patch on a collection with 650 items:</li> -</ul> - -<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt -</code></pre> - -<p></p> - - - - March, 2017 - https://alanorth.github.io/cgspace-notes/2017-03/ - Wed, 01 Mar 2017 17:08:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-03/ - <h2 id="2017-03-01">2017-03-01</h2> - -<ul> -<li>Run the 279 CIAT author corrections on CGSpace</li> -</ul> - -<h2 id="2017-03-02">2017-03-02</h2> - -<ul> -<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> -<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> -<li>They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities</li> -<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li> -<li>Need to send Peter and Michael some notes about this in a few days</li> -<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> -<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> -<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> -<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</li> -</ul> - -<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg -/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 -</code></pre> - -<p></p> - - - - February, 2017 - https://alanorth.github.io/cgspace-notes/2017-02/ - Tue, 07 Feb 2017 07:04:52 -0800 - - https://alanorth.github.io/cgspace-notes/2017-02/ - <h2 id="2017-02-07">2017-02-07</h2> - -<ul> -<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li> -</ul> - -<pre><code>dspace=# select * from collection2item where item_id = '80278'; - id | collection_id | item_id --------+---------------+--------- - 92551 | 313 | 80278 - 92550 | 313 | 80278 - 90774 | 1051 | 80278 -(3 rows) -dspace=# delete from collection2item where id = 92551 and item_id = 80278; -DELETE 1 -</code></pre> - -<ul> -<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> -<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> -</ul> - -<p></p> - - - - January, 2017 - https://alanorth.github.io/cgspace-notes/2017-01/ - Mon, 02 Jan 2017 10:43:00 +0300 - - https://alanorth.github.io/cgspace-notes/2017-01/ - <h2 id="2017-01-02">2017-01-02</h2> - -<ul> -<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> -<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> -<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> -</ul> - -<p></p> - - - - December, 2016 - https://alanorth.github.io/cgspace-notes/2016-12/ - Fri, 02 Dec 2016 10:43:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-12/ - <h2 id="2016-12-02">2016-12-02</h2> - -<ul> -<li>CGSpace was down for five hours in the morning while I was sleeping</li> -<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li> -</ul> - -<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -</code></pre> - -<ul> -<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> -<li>I&rsquo;ve raised a ticket with Atmire to ask</li> -<li>Another worrying error from dspace.log is:</li> -</ul> - -<p></p> - - - - November, 2016 - https://alanorth.github.io/cgspace-notes/2016-11/ - Tue, 01 Nov 2016 09:21:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-11/ - <h2 id="2016-11-01">2016-11-01</h2> - -<ul> -<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> - -<p></p> - - - - October, 2016 - https://alanorth.github.io/cgspace-notes/2016-10/ - Mon, 03 Oct 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-10/ - <h2 id="2016-10-03">2016-10-03</h2> - -<ul> -<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> -<li>Need to test the following scenarios to see how author order is affected: - -<ul> -<li>ORCIDs only</li> -<li>ORCIDs plus normal authors</li> -</ul></li> -<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li> -</ul> - -<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X -</code></pre> - -<p></p> - - - - September, 2016 - https://alanorth.github.io/cgspace-notes/2016-09/ - Thu, 01 Sep 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-09/ - <h2 id="2016-09-01">2016-09-01</h2> - -<ul> -<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> -<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> -<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> -<li>It looks like we might be able to use OUs now, instead of DCs:</li> -</ul> - -<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; -</code></pre> - -<p></p> - - - - August, 2016 - https://alanorth.github.io/cgspace-notes/2016-08/ - Mon, 01 Aug 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-08/ - <h2 id="2016-08-01">2016-08-01</h2> - -<ul> -<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> -<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> -<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> -<li>bower stuff is a dead end, waste of time, too many issues</li> -<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> -<li>Start working on DSpace 5.1 → 5.5 port:</li> -</ul> - -<pre><code>$ git checkout -b 55new 5_x-prod -$ git reset --hard ilri/5_x-prod -$ git rebase -i dspace-5.5 -</code></pre> - -<p></p> - - - - July, 2016 - https://alanorth.github.io/cgspace-notes/2016-07/ - Fri, 01 Jul 2016 10:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-07/ - <h2 id="2016-07-01">2016-07-01</h2> - -<ul> -<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> -<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; -UPDATE 95 -dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; - text_value ------------- -(0 rows) -</code></pre> - -<ul> -<li>In this case the select query was showing 95 results before the update</li> -</ul> - -<p></p> - - - - June, 2016 - https://alanorth.github.io/cgspace-notes/2016-06/ - Wed, 01 Jun 2016 10:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-06/ - <h2 id="2016-06-01">2016-06-01</h2> - -<ul> -<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> -<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> -<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> -<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> -<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> -<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<p></p> - - - - May, 2016 - https://alanorth.github.io/cgspace-notes/2016-05/ - Sun, 01 May 2016 23:06:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-05/ - <h2 id="2016-05-01">2016-05-01</h2> - -<ul> -<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> -<li>I have blocked access to the API now</li> -<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li> -</ul> - -<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l -3168 -</code></pre> - -<p></p> - - - - April, 2016 - https://alanorth.github.io/cgspace-notes/2016-04/ - Mon, 04 Apr 2016 11:06:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-04/ - <h2 id="2016-04-04">2016-04-04</h2> - -<ul> -<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> -<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> -<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> -<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> -<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> -</ul> - -<p></p> - - - - March, 2016 - https://alanorth.github.io/cgspace-notes/2016-03/ - Wed, 02 Mar 2016 16:50:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-03/ - <h2 id="2016-03-02">2016-03-02</h2> - -<ul> -<li>Looking at issues with author authorities on CGSpace</li> -<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> -<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> -</ul> - -<p></p> - - - - February, 2016 - https://alanorth.github.io/cgspace-notes/2016-02/ - Fri, 05 Feb 2016 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-02/ - <h2 id="2016-02-05">2016-02-05</h2> - -<ul> -<li>Looking at some DAGRIS data for Abenet Yabowork</li> -<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> -<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p> - -<ul> -<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> -<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> -</ul> - -<p></p> - - - - January, 2016 - https://alanorth.github.io/cgspace-notes/2016-01/ - Wed, 13 Jan 2016 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-01/ - <h2 id="2016-01-13">2016-01-13</h2> - -<ul> -<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> -<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> -<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> -</ul> - -<p></p> - - - - December, 2015 - https://alanorth.github.io/cgspace-notes/2015-12/ - Wed, 02 Dec 2015 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2015-12/ - <h2 id="2015-12-02">2015-12-02</h2> - -<ul> -<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li> -</ul> - -<pre><code># cd /home/dspacetest.cgiar.org/log -# ls -lh dspace.log.2015-11-18* --rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 --rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo --rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -</code></pre> - -<p></p> - - - - November, 2015 - https://alanorth.github.io/cgspace-notes/2015-11/ - Mon, 23 Nov 2015 17:00:57 +0300 - - https://alanorth.github.io/cgspace-notes/2015-11/ - <h2 id="2015-11-22">2015-11-22</h2> - -<ul> -<li>CGSpace went down</li> -<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> -<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li> -</ul> - -<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace -78 -</code></pre> - -<p></p> - - - - \ No newline at end of file diff --git a/public/post/page/1/index.html b/public/post/page/1/index.html deleted file mode 100644 index c4d1d9390..000000000 --- a/public/post/page/1/index.html +++ /dev/null @@ -1 +0,0 @@ -https://alanorth.github.io/cgspace-notes/post/ \ No newline at end of file diff --git a/public/post/page/2/index.html b/public/post/page/2/index.html deleted file mode 100644 index 164dd5849..000000000 --- a/public/post/page/2/index.html +++ /dev/null @@ -1,502 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

May, 2017

- -
- 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - Read more → -
- - - - - - -
-
-

April, 2017

- -
-

2017-04-02

- -
    -
  • Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): https://github.com/ilri/DSpace/pull/317
  • -
  • Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
  • -
- -

dc.rights in the submission form

- -
    -
  • Remove redundant/duplicate text in the DSpace submission license
  • -
  • Testing the CMYK patch on a collection with 650 items:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
- -

- Read more → -
- - - - - - -
-
-

March, 2017

- -
-

2017-03-01

- -
    -
  • Run the 279 CIAT author corrections on CGSpace
  • -
- -

2017-03-02

- -
    -
  • Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
  • -
  • CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
  • -
  • They might come in at the top level in one “CGIAR System” community, or with several communities
  • -
  • I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
  • -
  • Need to send Peter and Michael some notes about this in a few days
  • -
  • Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
  • -
  • Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
  • -
  • Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
  • -
  • Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 1056851999):
  • -
- -
$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-
- -

- Read more → -
- - - - - - -
-
-

February, 2017

- -
-

2017-02-07

- -
    -
  • An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
  • -
- -
dspace=# select * from collection2item where item_id = '80278';
-  id   | collection_id | item_id
--------+---------------+---------
- 92551 |           313 |   80278
- 92550 |           313 |   80278
- 90774 |          1051 |   80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-
- -
    -
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • -
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2017

- -
-

2017-01-02

- -
    -
  • I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
  • -
  • I tested on DSpace Test as well and it doesn’t work there either
  • -
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2016

- -
-

2016-12-02

- -
    -
  • CGSpace was down for five hours in the morning while I was sleeping
  • -
  • While looking in the logs for errors, I see tons of warnings about Atmire MQM:
  • -
- -
2016-12-02 03:00:32,352 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-
- -
    -
  • I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
  • -
  • I’ve raised a ticket with Atmire to ask
  • -
  • Another worrying error from dspace.log is:
  • -
- -

- Read more → -
- - - - - - -
-
-

November, 2016

- -
-

2016-11-01

- -
    -
  • Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
  • -
- -

Listings and Reports with output type

- -

- Read more → -
- - - - - - -
-
-

October, 2016

- -
-

2016-10-03

- -
    -
  • Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
  • -
  • Need to test the following scenarios to see how author order is affected: - -
      -
    • ORCIDs only
    • -
    • ORCIDs plus normal authors
    • -
  • -
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • -
- -
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-
- -

- Read more → -
- - - - - - -
-
-

September, 2016

- -
-

2016-09-01

- -
    -
  • Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
  • -
  • Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
  • -
  • We had been using DC=ILRI to determine whether a user was ILRI or not
  • -
  • It looks like we might be able to use OUs now, instead of DCs:
  • -
- -
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-
- -

- Read more → -
- - - - - - -
-
-

August, 2016

- -
-

2016-08-01

- -
    -
  • Add updated distribution license from Sisay (#259)
  • -
  • Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
  • -
  • Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
  • -
  • bower stuff is a dead end, waste of time, too many issues
  • -
  • Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
  • -
  • Start working on DSpace 5.1 → 5.5 port:
  • -
- -
$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-
- -

- Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/post/page/3/index.html b/public/post/page/3/index.html deleted file mode 100644 index 90223147f..000000000 --- a/public/post/page/3/index.html +++ /dev/null @@ -1,449 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

July, 2016

- -
-

2016-07-01

- -
    -
  • Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
  • -
  • I think this query should find and replace all authors that have “,” at the end of their names:
  • -
- -
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-
- -
    -
  • In this case the select query was showing 95 results before the update
  • -
- -

- Read more → -
- - - - - - -
-
-

June, 2016

- -
-

2016-06-01

- - - -

- Read more → -
- - - - - - -
-
-

May, 2016

- -
-

2016-05-01

- -
    -
  • Since yesterday there have been 10,000 REST errors and the site has been unstable again
  • -
  • I have blocked access to the API now
  • -
  • There are 3,000 IPs accessing the REST API in a 24-hour period!
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
-3168
-
- -

- Read more → -
- - - - - - -
-
-

April, 2016

- -
-

2016-04-04

- -
    -
  • Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
  • -
  • We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
  • -
  • After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
  • -
  • This will save us a few gigs of backup space we’re paying for on S3
  • -
  • Also, I noticed the checker log has some errors we should pay attention to:
  • -
- -

- Read more → -
- - - - - - -
-
-

March, 2016

- -
-

2016-03-02

- -
    -
  • Looking at issues with author authorities on CGSpace
  • -
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • -
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • -
- -

- Read more → -
- - - - - - -
-
-

February, 2016

- -
-

2016-02-05

- -
    -
  • Looking at some DAGRIS data for Abenet Yabowork
  • -
  • Lots of issues with spaces, newlines, etc causing the import to fail
  • -
  • I noticed we have a very interesting list of countries on CGSpace:
  • -
- -

CGSpace country list

- -
    -
  • Not only are there 49,000 countries, we have some blanks (25)…
  • -
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2016

- -
-

2016-01-13

- -
    -
  • Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
  • -
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • -
  • Update GitHub wiki for documentation of maintenance tasks.
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2015

- -
-

2015-12-02

- -
    -
  • Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
  • -
- -
# cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-
- -

- Read more → -
- - - - - - -
-
-

November, 2015

- -
-

2015-11-22

- -
    -
  • CGSpace went down
  • -
  • Looks like DSpace exhausted its PostgreSQL connection pool
  • -
  • Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-
- -

- Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/robots.txt b/public/robots.txt deleted file mode 100644 index 12f3bc2f9..000000000 --- a/public/robots.txt +++ /dev/null @@ -1,38 +0,0 @@ -User-agent: * - - -Disallow: /cgspace-notes/2018-02/ -Disallow: /cgspace-notes/2018-01/ -Disallow: /cgspace-notes/2017-12/ -Disallow: /cgspace-notes/2017-11/ -Disallow: /cgspace-notes/2017-10/ -Disallow: /cgspace-notes/cgiar-library-migration/ -Disallow: /cgspace-notes/2017-09/ -Disallow: /cgspace-notes/2017-08/ -Disallow: /cgspace-notes/2017-07/ -Disallow: /cgspace-notes/2017-06/ -Disallow: /cgspace-notes/2017-05/ -Disallow: /cgspace-notes/2017-04/ -Disallow: /cgspace-notes/2017-03/ -Disallow: /cgspace-notes/2017-02/ -Disallow: /cgspace-notes/2017-01/ -Disallow: /cgspace-notes/2016-12/ -Disallow: /cgspace-notes/2016-11/ -Disallow: /cgspace-notes/2016-10/ -Disallow: /cgspace-notes/2016-09/ -Disallow: /cgspace-notes/2016-08/ -Disallow: /cgspace-notes/2016-07/ -Disallow: /cgspace-notes/2016-06/ -Disallow: /cgspace-notes/2016-05/ -Disallow: /cgspace-notes/2016-04/ -Disallow: /cgspace-notes/2016-03/ -Disallow: /cgspace-notes/2016-02/ -Disallow: /cgspace-notes/2016-01/ -Disallow: /cgspace-notes/2015-12/ -Disallow: /cgspace-notes/2015-11/ -Disallow: /cgspace-notes/ -Disallow: /cgspace-notes/categories/ -Disallow: /cgspace-notes/tags/notes/ -Disallow: /cgspace-notes/categories/notes/ -Disallow: /cgspace-notes/post/ -Disallow: /cgspace-notes/tags/ diff --git a/public/sitemap.xml b/public/sitemap.xml deleted file mode 100644 index a119b3722..000000000 --- a/public/sitemap.xml +++ /dev/null @@ -1,185 +0,0 @@ - - - - - https://alanorth.github.io/cgspace-notes/2018-02/ - 2018-02-11T10:01:13+02:00 - - - - https://alanorth.github.io/cgspace-notes/2018-01/ - 2018-01-31T16:17:39+02:00 - - - - https://alanorth.github.io/cgspace-notes/2017-12/ - 2017-12-31T10:42:16-08:00 - - - - https://alanorth.github.io/cgspace-notes/2017-11/ - 2018-01-12T06:07:03+02:00 - - - - https://alanorth.github.io/cgspace-notes/2017-10/ - 2017-11-02T16:13:10+02:00 - - - - https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - 2017-09-28T12:00:49+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-09/ - 2017-09-28T07:56:11+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-08/ - 2017-09-10T19:18:52+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-07/ - 2017-08-01T08:55:37+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-06/ - 2017-06-30T18:34:51+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-05/ - 2017-09-10T17:46:54+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-04/ - 2017-04-26T13:35:10+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-03/ - 2017-03-31T05:36:10+03:00 - - - - https://alanorth.github.io/cgspace-notes/2017-02/ - 2017-02-28T22:58:29+02:00 - - - - https://alanorth.github.io/cgspace-notes/2017-01/ - 2017-01-29T13:18:32+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-12/ - 2017-09-19T16:07:20+03:00 - - - - https://alanorth.github.io/cgspace-notes/2016-11/ - 2017-01-10T16:21:47+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-10/ - 2017-01-10T16:21:47+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-09/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-08/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-07/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-06/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-05/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-04/ - 2016-09-28T17:02:30+03:00 - - - - https://alanorth.github.io/cgspace-notes/2016-03/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-02/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2016-01/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2015-12/ - 2017-01-09T16:18:07+02:00 - - - - https://alanorth.github.io/cgspace-notes/2015-11/ - 2016-09-28T17:02:30+03:00 - - - - https://alanorth.github.io/cgspace-notes/ - 2018-02-11T10:01:13+02:00 - 0 - - - - https://alanorth.github.io/cgspace-notes/categories/ - 0 - - - - https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-02-11T10:01:13+02:00 - 0 - - - - https://alanorth.github.io/cgspace-notes/categories/notes/ - 2017-09-28T12:00:49+03:00 - 0 - - - - https://alanorth.github.io/cgspace-notes/post/ - 2018-02-11T10:01:13+02:00 - 0 - - - - https://alanorth.github.io/cgspace-notes/tags/ - 2018-02-11T10:01:13+02:00 - 0 - - - \ No newline at end of file diff --git a/public/tags/index.html b/public/tags/index.html deleted file mode 100644 index 23485a7f2..000000000 --- a/public/tags/index.html +++ /dev/null @@ -1,170 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/tags/index.xml b/public/tags/index.xml deleted file mode 100644 index d0552c262..000000000 --- a/public/tags/index.xml +++ /dev/null @@ -1,24 +0,0 @@ - - - - Tags on CGSpace Notes - https://alanorth.github.io/cgspace-notes/tags/ - Recent content in Tags on CGSpace Notes - Hugo -- gohugo.io - en-us - Thu, 01 Feb 2018 16:28:54 +0200 - - - - - - Notes - https://alanorth.github.io/cgspace-notes/tags/notes/ - Thu, 01 Feb 2018 16:28:54 +0200 - - https://alanorth.github.io/cgspace-notes/tags/notes/ - - - - - \ No newline at end of file diff --git a/public/tags/notes/index.html b/public/tags/notes/index.html deleted file mode 100644 index c117bf457..000000000 --- a/public/tags/notes/index.html +++ /dev/null @@ -1,543 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

February, 2018

- -
-

2018-02-01

- -
    -
  • Peter gave feedback on the dc.rights proof of concept that I had sent him last week
  • -
  • We don’t need to distinguish between internal and external works, so that makes it just a simple list
  • -
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • -
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2018

- -
-

2018-01-02

- -
    -
  • Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
  • -
  • I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
  • -
  • The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
  • -
  • In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
  • -
  • And just before that I see this:
  • -
- -
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
-
- -
    -
  • Ah hah! So the pool was actually empty!
  • -
  • I need to increase that, let’s try to bump it up from 50 to 75
  • -
  • After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
  • -
  • I notice this error quite a few times in dspace.log:
  • -
- -
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
-org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
-
- -
    -
  • And there are many of these errors every day for the past month:
  • -
- -
$ grep -c "Error while searching for sidebar facets" dspace.log.*
-dspace.log.2017-11-21:4
-dspace.log.2017-11-22:1
-dspace.log.2017-11-23:4
-dspace.log.2017-11-24:11
-dspace.log.2017-11-25:0
-dspace.log.2017-11-26:1
-dspace.log.2017-11-27:7
-dspace.log.2017-11-28:21
-dspace.log.2017-11-29:31
-dspace.log.2017-11-30:15
-dspace.log.2017-12-01:15
-dspace.log.2017-12-02:20
-dspace.log.2017-12-03:38
-dspace.log.2017-12-04:65
-dspace.log.2017-12-05:43
-dspace.log.2017-12-06:72
-dspace.log.2017-12-07:27
-dspace.log.2017-12-08:15
-dspace.log.2017-12-09:29
-dspace.log.2017-12-10:35
-dspace.log.2017-12-11:20
-dspace.log.2017-12-12:44
-dspace.log.2017-12-13:36
-dspace.log.2017-12-14:59
-dspace.log.2017-12-15:104
-dspace.log.2017-12-16:53
-dspace.log.2017-12-17:66
-dspace.log.2017-12-18:83
-dspace.log.2017-12-19:101
-dspace.log.2017-12-20:74
-dspace.log.2017-12-21:55
-dspace.log.2017-12-22:66
-dspace.log.2017-12-23:50
-dspace.log.2017-12-24:85
-dspace.log.2017-12-25:62
-dspace.log.2017-12-26:49
-dspace.log.2017-12-27:30
-dspace.log.2017-12-28:54
-dspace.log.2017-12-29:68
-dspace.log.2017-12-30:89
-dspace.log.2017-12-31:53
-dspace.log.2018-01-01:45
-dspace.log.2018-01-02:34
-
- -
    -
  • Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2017

- -
-

2017-12-01

- -
    -
  • Uptime Robot noticed that CGSpace went down
  • -
  • The logs say “Timeout waiting for idle object”
  • -
  • PostgreSQL activity says there are 115 connections currently
  • -
  • The list of connections to XMLUI and REST API for today:
  • -
- -

- Read more → -
- - - - - - -
-
-

November, 2017

- -
-

2017-11-01

- -
    -
  • The CORE developers responded to say they are looking into their bot not respecting our robots.txt
  • -
- -

2017-11-02

- -
    -
  • Today there have been no hits by CORE and no alerts from Linode (coincidence?)
  • -
- -
# grep -c "CORE" /var/log/nginx/access.log
-0
-
- -
    -
  • Generate list of authors on CGSpace for Peter to go through and correct:
  • -
- -
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
-COPY 54701
-
- -

- Read more → -
- - - - - - -
-
-

October, 2017

- -
-

2017-10-01

- - - -
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
-
- -
    -
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • -
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • -
- -

- Read more → -
- - - - - - -
-
-

September, 2017

- -
-

2017-09-06

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • -
- -

2017-09-07

- -
    -
  • Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
  • -
- -

- Read more → -
- - - - - - -
-
-

August, 2017

- -
-

2017-08-01

- -
    -
  • Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
  • -
  • I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
  • -
  • The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session
  • -
  • This means our Tomcat Crawler Session Valve is working
  • -
  • But many of the bots are browsing dynamic URLs like: - -
      -
    • /handle/10568/3353/discover
    • -
    • /handle/10568/16510/browse
    • -
  • -
  • The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
  • -
  • Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
  • -
  • It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
  • -
  • Also, the bot has to successfully browse the page first so it can receive the HTTP header…
  • -
  • We might actually have to block these requests with HTTP 403 depending on the user agent
  • -
  • Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
  • -
  • This was due to newline characters in the dc.description.abstract column, which caused OpenRefine to choke when exporting the CSV
  • -
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • -
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • -
- -

- Read more → -
- - - - - - -
-
-

July, 2017

- -
-

2017-07-01

- -
    -
  • Run system updates and reboot DSpace Test
  • -
- -

2017-07-04

- -
    -
  • Merge changes for WLE Phase II theme rename (#329)
  • -
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • -
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • -
- -

- Read more → -
- - - - - - -
-
-

June, 2017

- -
- 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - Read more → -
- - - - - - -
-
-

May, 2017

- -
- 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml deleted file mode 100644 index 4854d4372..000000000 --- a/public/tags/notes/index.xml +++ /dev/null @@ -1,702 +0,0 @@ - - - - Notes on CGSpace Notes - https://alanorth.github.io/cgspace-notes/tags/notes/ - Recent content in Notes on CGSpace Notes - Hugo -- gohugo.io - en-us - Thu, 01 Feb 2018 16:28:54 +0200 - - - - - - February, 2018 - https://alanorth.github.io/cgspace-notes/2018-02/ - Thu, 01 Feb 2018 16:28:54 +0200 - - https://alanorth.github.io/cgspace-notes/2018-02/ - <h2 id="2018-02-01">2018-02-01</h2> - -<ul> -<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> -<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> -<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> -<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li> -</ul> - -<p></p> - - - - January, 2018 - https://alanorth.github.io/cgspace-notes/2018-01/ - Tue, 02 Jan 2018 08:35:54 -0800 - - https://alanorth.github.io/cgspace-notes/2018-01/ - <h2 id="2018-01-02">2018-01-02</h2> - -<ul> -<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> -<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> -<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> -<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> -<li>And just before that I see this:</li> -</ul> - -<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. -</code></pre> - -<ul> -<li>Ah hah! So the pool was actually empty!</li> -<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li> -<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li> -<li>I notice this error quite a few times in dspace.log:</li> -</ul> - -<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets -org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. -</code></pre> - -<ul> -<li>And there are many of these errors every day for the past month:</li> -</ul> - -<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* -dspace.log.2017-11-21:4 -dspace.log.2017-11-22:1 -dspace.log.2017-11-23:4 -dspace.log.2017-11-24:11 -dspace.log.2017-11-25:0 -dspace.log.2017-11-26:1 -dspace.log.2017-11-27:7 -dspace.log.2017-11-28:21 -dspace.log.2017-11-29:31 -dspace.log.2017-11-30:15 -dspace.log.2017-12-01:15 -dspace.log.2017-12-02:20 -dspace.log.2017-12-03:38 -dspace.log.2017-12-04:65 -dspace.log.2017-12-05:43 -dspace.log.2017-12-06:72 -dspace.log.2017-12-07:27 -dspace.log.2017-12-08:15 -dspace.log.2017-12-09:29 -dspace.log.2017-12-10:35 -dspace.log.2017-12-11:20 -dspace.log.2017-12-12:44 -dspace.log.2017-12-13:36 -dspace.log.2017-12-14:59 -dspace.log.2017-12-15:104 -dspace.log.2017-12-16:53 -dspace.log.2017-12-17:66 -dspace.log.2017-12-18:83 -dspace.log.2017-12-19:101 -dspace.log.2017-12-20:74 -dspace.log.2017-12-21:55 -dspace.log.2017-12-22:66 -dspace.log.2017-12-23:50 -dspace.log.2017-12-24:85 -dspace.log.2017-12-25:62 -dspace.log.2017-12-26:49 -dspace.log.2017-12-27:30 -dspace.log.2017-12-28:54 -dspace.log.2017-12-29:68 -dspace.log.2017-12-30:89 -dspace.log.2017-12-31:53 -dspace.log.2018-01-01:45 -dspace.log.2018-01-02:34 -</code></pre> - -<ul> -<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li> -</ul> - -<p></p> - - - - December, 2017 - https://alanorth.github.io/cgspace-notes/2017-12/ - Fri, 01 Dec 2017 13:53:54 +0300 - - https://alanorth.github.io/cgspace-notes/2017-12/ - <h2 id="2017-12-01">2017-12-01</h2> - -<ul> -<li>Uptime Robot noticed that CGSpace went down</li> -<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> -<li>PostgreSQL activity says there are 115 connections currently</li> -<li>The list of connections to XMLUI and REST API for today:</li> -</ul> - -<p></p> - - - - November, 2017 - https://alanorth.github.io/cgspace-notes/2017-11/ - Thu, 02 Nov 2017 09:37:54 +0200 - - https://alanorth.github.io/cgspace-notes/2017-11/ - <h2 id="2017-11-01">2017-11-01</h2> - -<ul> -<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> -</ul> - -<h2 id="2017-11-02">2017-11-02</h2> - -<ul> -<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li> -</ul> - -<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log -0 -</code></pre> - -<ul> -<li>Generate list of authors on CGSpace for Peter to go through and correct:</li> -</ul> - -<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; -COPY 54701 -</code></pre> - -<p></p> - - - - October, 2017 - https://alanorth.github.io/cgspace-notes/2017-10/ - Sun, 01 Oct 2017 08:07:54 +0300 - - https://alanorth.github.io/cgspace-notes/2017-10/ - <h2 id="2017-10-01">2017-10-01</h2> - -<ul> -<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li> -</ul> - -<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 -</code></pre> - -<ul> -<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li> -<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li> -</ul> - -<p></p> - - - - September, 2017 - https://alanorth.github.io/cgspace-notes/2017-09/ - Thu, 07 Sep 2017 16:54:52 +0700 - - https://alanorth.github.io/cgspace-notes/2017-09/ - <h2 id="2017-09-06">2017-09-06</h2> - -<ul> -<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> -</ul> - -<h2 id="2017-09-07">2017-09-07</h2> - -<ul> -<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> -</ul> - -<p></p> - - - - August, 2017 - https://alanorth.github.io/cgspace-notes/2017-08/ - Tue, 01 Aug 2017 11:51:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-08/ - <h2 id="2017-08-01">2017-08-01</h2> - -<ul> -<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> -<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> -<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> -<li>This means our Tomcat Crawler Session Valve is working</li> -<li>But many of the bots are browsing dynamic URLs like: - -<ul> -<li>/handle/10568/3353/discover</li> -<li>/handle/10568/16510/browse</li> -</ul></li> -<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> -<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> -<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> -<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> -<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> -<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> -<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li> -<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> -<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> -</ul> - -<p></p> - - - - July, 2017 - https://alanorth.github.io/cgspace-notes/2017-07/ - Sat, 01 Jul 2017 18:03:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-07/ - <h2 id="2017-07-01">2017-07-01</h2> - -<ul> -<li>Run system updates and reboot DSpace Test</li> -</ul> - -<h2 id="2017-07-04">2017-07-04</h2> - -<ul> -<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> -<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> -<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> -</ul> - -<p></p> - - - - June, 2017 - https://alanorth.github.io/cgspace-notes/2017-06/ - Thu, 01 Jun 2017 10:14:52 +0300 - - https://alanorth.github.io/cgspace-notes/2017-06/ - 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - - - - May, 2017 - https://alanorth.github.io/cgspace-notes/2017-05/ - Mon, 01 May 2017 16:21:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-05/ - 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - - - - April, 2017 - https://alanorth.github.io/cgspace-notes/2017-04/ - Sun, 02 Apr 2017 17:08:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-04/ - <h2 id="2017-04-02">2017-04-02</h2> - -<ul> -<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> -<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p> - -<ul> -<li>Remove redundant/duplicate text in the DSpace submission license</li> -<li>Testing the CMYK patch on a collection with 650 items:</li> -</ul> - -<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt -</code></pre> - -<p></p> - - - - March, 2017 - https://alanorth.github.io/cgspace-notes/2017-03/ - Wed, 01 Mar 2017 17:08:52 +0200 - - https://alanorth.github.io/cgspace-notes/2017-03/ - <h2 id="2017-03-01">2017-03-01</h2> - -<ul> -<li>Run the 279 CIAT author corrections on CGSpace</li> -</ul> - -<h2 id="2017-03-02">2017-03-02</h2> - -<ul> -<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> -<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> -<li>They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities</li> -<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li> -<li>Need to send Peter and Michael some notes about this in a few days</li> -<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> -<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> -<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> -<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</li> -</ul> - -<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg -/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 -</code></pre> - -<p></p> - - - - February, 2017 - https://alanorth.github.io/cgspace-notes/2017-02/ - Tue, 07 Feb 2017 07:04:52 -0800 - - https://alanorth.github.io/cgspace-notes/2017-02/ - <h2 id="2017-02-07">2017-02-07</h2> - -<ul> -<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li> -</ul> - -<pre><code>dspace=# select * from collection2item where item_id = '80278'; - id | collection_id | item_id --------+---------------+--------- - 92551 | 313 | 80278 - 92550 | 313 | 80278 - 90774 | 1051 | 80278 -(3 rows) -dspace=# delete from collection2item where id = 92551 and item_id = 80278; -DELETE 1 -</code></pre> - -<ul> -<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> -<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> -</ul> - -<p></p> - - - - January, 2017 - https://alanorth.github.io/cgspace-notes/2017-01/ - Mon, 02 Jan 2017 10:43:00 +0300 - - https://alanorth.github.io/cgspace-notes/2017-01/ - <h2 id="2017-01-02">2017-01-02</h2> - -<ul> -<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> -<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> -<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> -</ul> - -<p></p> - - - - December, 2016 - https://alanorth.github.io/cgspace-notes/2016-12/ - Fri, 02 Dec 2016 10:43:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-12/ - <h2 id="2016-12-02">2016-12-02</h2> - -<ul> -<li>CGSpace was down for five hours in the morning while I was sleeping</li> -<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li> -</ul> - -<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) -</code></pre> - -<ul> -<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> -<li>I&rsquo;ve raised a ticket with Atmire to ask</li> -<li>Another worrying error from dspace.log is:</li> -</ul> - -<p></p> - - - - November, 2016 - https://alanorth.github.io/cgspace-notes/2016-11/ - Tue, 01 Nov 2016 09:21:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-11/ - <h2 id="2016-11-01">2016-11-01</h2> - -<ul> -<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> - -<p></p> - - - - October, 2016 - https://alanorth.github.io/cgspace-notes/2016-10/ - Mon, 03 Oct 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-10/ - <h2 id="2016-10-03">2016-10-03</h2> - -<ul> -<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> -<li>Need to test the following scenarios to see how author order is affected: - -<ul> -<li>ORCIDs only</li> -<li>ORCIDs plus normal authors</li> -</ul></li> -<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li> -</ul> - -<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X -</code></pre> - -<p></p> - - - - September, 2016 - https://alanorth.github.io/cgspace-notes/2016-09/ - Thu, 01 Sep 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-09/ - <h2 id="2016-09-01">2016-09-01</h2> - -<ul> -<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> -<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> -<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> -<li>It looks like we might be able to use OUs now, instead of DCs:</li> -</ul> - -<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; -</code></pre> - -<p></p> - - - - August, 2016 - https://alanorth.github.io/cgspace-notes/2016-08/ - Mon, 01 Aug 2016 15:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-08/ - <h2 id="2016-08-01">2016-08-01</h2> - -<ul> -<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> -<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> -<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> -<li>bower stuff is a dead end, waste of time, too many issues</li> -<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> -<li>Start working on DSpace 5.1 → 5.5 port:</li> -</ul> - -<pre><code>$ git checkout -b 55new 5_x-prod -$ git reset --hard ilri/5_x-prod -$ git rebase -i dspace-5.5 -</code></pre> - -<p></p> - - - - July, 2016 - https://alanorth.github.io/cgspace-notes/2016-07/ - Fri, 01 Jul 2016 10:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-07/ - <h2 id="2016-07-01">2016-07-01</h2> - -<ul> -<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> -<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; -UPDATE 95 -dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; - text_value ------------- -(0 rows) -</code></pre> - -<ul> -<li>In this case the select query was showing 95 results before the update</li> -</ul> - -<p></p> - - - - June, 2016 - https://alanorth.github.io/cgspace-notes/2016-06/ - Wed, 01 Jun 2016 10:53:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-06/ - <h2 id="2016-06-01">2016-06-01</h2> - -<ul> -<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> -<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> -<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> -<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> -<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> -<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<p></p> - - - - May, 2016 - https://alanorth.github.io/cgspace-notes/2016-05/ - Sun, 01 May 2016 23:06:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-05/ - <h2 id="2016-05-01">2016-05-01</h2> - -<ul> -<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> -<li>I have blocked access to the API now</li> -<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li> -</ul> - -<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l -3168 -</code></pre> - -<p></p> - - - - April, 2016 - https://alanorth.github.io/cgspace-notes/2016-04/ - Mon, 04 Apr 2016 11:06:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-04/ - <h2 id="2016-04-04">2016-04-04</h2> - -<ul> -<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> -<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> -<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> -<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> -<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> -</ul> - -<p></p> - - - - March, 2016 - https://alanorth.github.io/cgspace-notes/2016-03/ - Wed, 02 Mar 2016 16:50:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-03/ - <h2 id="2016-03-02">2016-03-02</h2> - -<ul> -<li>Looking at issues with author authorities on CGSpace</li> -<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> -<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> -</ul> - -<p></p> - - - - February, 2016 - https://alanorth.github.io/cgspace-notes/2016-02/ - Fri, 05 Feb 2016 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-02/ - <h2 id="2016-02-05">2016-02-05</h2> - -<ul> -<li>Looking at some DAGRIS data for Abenet Yabowork</li> -<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> -<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> -</ul> - -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p> - -<ul> -<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> -<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> -</ul> - -<p></p> - - - - January, 2016 - https://alanorth.github.io/cgspace-notes/2016-01/ - Wed, 13 Jan 2016 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2016-01/ - <h2 id="2016-01-13">2016-01-13</h2> - -<ul> -<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> -<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> -<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> -</ul> - -<p></p> - - - - December, 2015 - https://alanorth.github.io/cgspace-notes/2015-12/ - Wed, 02 Dec 2015 13:18:00 +0300 - - https://alanorth.github.io/cgspace-notes/2015-12/ - <h2 id="2015-12-02">2015-12-02</h2> - -<ul> -<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li> -</ul> - -<pre><code># cd /home/dspacetest.cgiar.org/log -# ls -lh dspace.log.2015-11-18* --rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 --rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo --rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -</code></pre> - -<p></p> - - - - November, 2015 - https://alanorth.github.io/cgspace-notes/2015-11/ - Mon, 23 Nov 2015 17:00:57 +0300 - - https://alanorth.github.io/cgspace-notes/2015-11/ - <h2 id="2015-11-22">2015-11-22</h2> - -<ul> -<li>CGSpace went down</li> -<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> -<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li> -</ul> - -<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace -78 -</code></pre> - -<p></p> - - - - \ No newline at end of file diff --git a/public/tags/notes/page/1/index.html b/public/tags/notes/page/1/index.html deleted file mode 100644 index 0d7fc0af7..000000000 --- a/public/tags/notes/page/1/index.html +++ /dev/null @@ -1 +0,0 @@ -https://alanorth.github.io/cgspace-notes/tags/notes/ \ No newline at end of file diff --git a/public/tags/notes/page/2/index.html b/public/tags/notes/page/2/index.html deleted file mode 100644 index dd89a2a36..000000000 --- a/public/tags/notes/page/2/index.html +++ /dev/null @@ -1,521 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

April, 2017

- -
-

2017-04-02

- -
    -
  • Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): https://github.com/ilri/DSpace/pull/317
  • -
  • Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
  • -
- -

dc.rights in the submission form

- -
    -
  • Remove redundant/duplicate text in the DSpace submission license
  • -
  • Testing the CMYK patch on a collection with 650 items:
  • -
- -
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
- -

- Read more → -
- - - - - - -
-
-

March, 2017

- -
-

2017-03-01

- -
    -
  • Run the 279 CIAT author corrections on CGSpace
  • -
- -

2017-03-02

- -
    -
  • Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
  • -
  • CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
  • -
  • They might come in at the top level in one “CGIAR System” community, or with several communities
  • -
  • I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
  • -
  • Need to send Peter and Michael some notes about this in a few days
  • -
  • Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
  • -
  • Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
  • -
  • Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
  • -
  • Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 1056851999):
  • -
- -
$ identify ~/Desktop/alc_contrastes_desafios.jpg
-/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
-
- -

- Read more → -
- - - - - - -
-
-

February, 2017

- -
-

2017-02-07

- -
    -
  • An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
  • -
- -
dspace=# select * from collection2item where item_id = '80278';
-  id   | collection_id | item_id
--------+---------------+---------
- 92551 |           313 |   80278
- 92550 |           313 |   80278
- 90774 |          1051 |   80278
-(3 rows)
-dspace=# delete from collection2item where id = 92551 and item_id = 80278;
-DELETE 1
-
- -
    -
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • -
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2017

- -
-

2017-01-02

- -
    -
  • I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
  • -
  • I tested on DSpace Test as well and it doesn’t work there either
  • -
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2016

- -
-

2016-12-02

- -
    -
  • CGSpace was down for five hours in the morning while I was sleeping
  • -
  • While looking in the logs for errors, I see tons of warnings about Atmire MQM:
  • -
- -
2016-12-02 03:00:32,352 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
-2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
-
- -
    -
  • I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
  • -
  • I’ve raised a ticket with Atmire to ask
  • -
  • Another worrying error from dspace.log is:
  • -
- -

- Read more → -
- - - - - - -
-
-

November, 2016

- -
-

2016-11-01

- -
    -
  • Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
  • -
- -

Listings and Reports with output type

- -

- Read more → -
- - - - - - -
-
-

October, 2016

- -
-

2016-10-03

- -
    -
  • Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
  • -
  • Need to test the following scenarios to see how author order is affected: - -
      -
    • ORCIDs only
    • -
    • ORCIDs plus normal authors
    • -
  • -
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • -
- -
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
-
- -

- Read more → -
- - - - - - -
-
-

September, 2016

- -
-

2016-09-01

- -
    -
  • Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
  • -
  • Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
  • -
  • We had been using DC=ILRI to determine whether a user was ILRI or not
  • -
  • It looks like we might be able to use OUs now, instead of DCs:
  • -
- -
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
-
- -

- Read more → -
- - - - - - -
-
-

August, 2016

- -
-

2016-08-01

- -
    -
  • Add updated distribution license from Sisay (#259)
  • -
  • Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
  • -
  • Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
  • -
  • bower stuff is a dead end, waste of time, too many issues
  • -
  • Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
  • -
  • Start working on DSpace 5.1 → 5.5 port:
  • -
- -
$ git checkout -b 55new 5_x-prod
-$ git reset --hard ilri/5_x-prod
-$ git rebase -i dspace-5.5
-
- -

- Read more → -
- - - - - - -
-
-

July, 2016

- -
-

2016-07-01

- -
    -
  • Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
  • -
  • I think this query should find and replace all authors that have “,” at the end of their names:
  • -
- -
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
-UPDATE 95
-dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
- text_value
-------------
-(0 rows)
-
- -
    -
  • In this case the select query was showing 95 results before the update
  • -
- -

- Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/tags/notes/page/3/index.html b/public/tags/notes/page/3/index.html deleted file mode 100644 index b0218031d..000000000 --- a/public/tags/notes/page/3/index.html +++ /dev/null @@ -1,412 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
-
- -
-
- - - -
-
-

CGSpace Notes

-

Documenting day-to-day work on the CGSpace repository.

-
-
- - - -
-
-
- - - - - - - - - -
-
-

June, 2016

- -
-

2016-06-01

- - - -

- Read more → -
- - - - - - -
-
-

May, 2016

- -
-

2016-05-01

- -
    -
  • Since yesterday there have been 10,000 REST errors and the site has been unstable again
  • -
  • I have blocked access to the API now
  • -
  • There are 3,000 IPs accessing the REST API in a 24-hour period!
  • -
- -
# awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
-3168
-
- -

- Read more → -
- - - - - - -
-
-

April, 2016

- -
-

2016-04-04

- -
    -
  • Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
  • -
  • We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
  • -
  • After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
  • -
  • This will save us a few gigs of backup space we’re paying for on S3
  • -
  • Also, I noticed the checker log has some errors we should pay attention to:
  • -
- -

- Read more → -
- - - - - - -
-
-

March, 2016

- -
-

2016-03-02

- -
    -
  • Looking at issues with author authorities on CGSpace
  • -
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • -
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • -
- -

- Read more → -
- - - - - - -
-
-

February, 2016

- -
-

2016-02-05

- -
    -
  • Looking at some DAGRIS data for Abenet Yabowork
  • -
  • Lots of issues with spaces, newlines, etc causing the import to fail
  • -
  • I noticed we have a very interesting list of countries on CGSpace:
  • -
- -

CGSpace country list

- -
    -
  • Not only are there 49,000 countries, we have some blanks (25)…
  • -
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • -
- -

- Read more → -
- - - - - - -
-
-

January, 2016

- -
-

2016-01-13

- -
    -
  • Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
  • -
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • -
  • Update GitHub wiki for documentation of maintenance tasks.
  • -
- -

- Read more → -
- - - - - - -
-
-

December, 2015

- -
-

2015-12-02

- -
    -
  • Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
  • -
- -
# cd /home/dspacetest.cgiar.org/log
-# ls -lh dspace.log.2015-11-18*
--rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
--rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
--rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
-
- -

- Read more → -
- - - - - - -
-
-

November, 2015

- -
-

2015-11-22

- -
    -
  • CGSpace went down
  • -
  • Looks like DSpace exhausted its PostgreSQL connection pool
  • -
  • Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
  • -
- -
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
-78
-
- -

- Read more → -
- - - - - - - - - - -
- - - - -
-
- - - - - - - - - diff --git a/public/tags/page/1/index.html b/public/tags/page/1/index.html deleted file mode 100644 index 8583d664e..000000000 --- a/public/tags/page/1/index.html +++ /dev/null @@ -1 +0,0 @@ -https://alanorth.github.io/cgspace-notes/tags/ \ No newline at end of file