From 96b6e63a46fd9b9f096622bbf68ae80a39b4d87a Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 12 Sep 2017 16:57:19 +0300 Subject: [PATCH] Add notes for 2017-09-12 --- content/post/2017-09.md | 36 +++++++++++++++++++++ public/2015-11/index.html | 2 +- public/2015-12/index.html | 2 +- public/2016-01/index.html | 2 +- public/2016-02/index.html | 2 +- public/2016-03/index.html | 2 +- public/2016-04/index.html | 2 +- public/2016-05/index.html | 2 +- public/2016-06/index.html | 2 +- public/2016-07/index.html | 2 +- public/2016-08/index.html | 2 +- public/2016-09/index.html | 2 +- public/2016-10/index.html | 2 +- public/2016-11/index.html | 2 +- public/2016-12/index.html | 2 +- public/2017-01/index.html | 2 +- public/2017-02/index.html | 2 +- public/2017-03/index.html | 2 +- public/2017-04/index.html | 2 +- public/2017-05/index.html | 2 +- public/2017-06/index.html | 2 +- public/2017-07/index.html | 2 +- public/2017-08/index.html | 6 ++-- public/2017-09/index.html | 49 +++++++++++++++++++++++++++-- public/index.html | 2 +- public/page/2/index.html | 2 +- public/page/3/index.html | 2 +- public/post/index.html | 2 +- public/post/page/2/index.html | 2 +- public/post/page/3/index.html | 2 +- public/sitemap.xml | 2 +- public/tags/notes/index.html | 2 +- public/tags/notes/page/2/index.html | 2 +- public/tags/notes/page/3/index.html | 2 +- 34 files changed, 117 insertions(+), 36 deletions(-) diff --git a/content/post/2017-09.md b/content/post/2017-09.md index dc33bd38a..1c8ac0fb6 100644 --- a/content/post/2017-09.md +++ b/content/post/2017-09.md @@ -55,3 +55,39 @@ dspace.log.2017-09-10:0 - I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we're currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system's PostgreSQL `max_connections` (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case) - I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit) - I'm expecting to see 0 connection errors for the next few months + +## 2017-09-11 + +- Lots of work testing the CGIAR Library migration +- Many technical notes and TODOs here: https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c + +## 2017-09-12 + +- I was testing the [METS XSD caching during AIP ingest](https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating) but it doesn't seem to help actually +- The import process takes the same amount of time with and without the caching +- Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java): + +``` +$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420' +``` + +- Great TCP dump guide here: https://danielmiessler.com/study/tcpdump +- The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation +- I sent a message to the mailing list to see if anyone knows more about this +- In looking at the tcpdump results I notice that there is an update check to the ehcache server on _every_ iteration of the ingest loop, for example: + +``` +09:39:36.008956 IP 192.168.8.124.50515 > 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433&os-name=Mac+OS+X&jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&jvm-version=1.8.0_144&platform=x86_64&tc-version=UNKNOWN&tc-product=Ehcache+Core+1.7.2&source=Ehcache+Core&uptime-secs=0&patch=UNKNOWN HTTP/1.1 +``` + +- Turns out this is a known issue and Ehcache has refused to make it opt-in: https://jira.terracotta.org/jira/browse/EHC-461 +- But we can disable it by adding an `updateCheck="false"` attribute to the main `` tag in `dspace-services/src/main/resources/caching/ehcache-config.xml` +- After re-compiling and re-deploying DSpace I no longer see those update checks during item submission +- I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace + - First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name + - The logic is that searching by name actually isn't very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names + - Atmire's proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs) + - Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field + - Ideally there could also be a user interface for cleanup and merging of authorities + - He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release + - As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us diff --git a/public/2015-11/index.html b/public/2015-11/index.html index abef934fe..4b585b71e 100644 --- a/public/2015-11/index.html +++ b/public/2015-11/index.html @@ -51,7 +51,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac "/> - + diff --git a/public/2015-12/index.html b/public/2015-12/index.html index 58535bc3b..0030f65df 100644 --- a/public/2015-12/index.html +++ b/public/2015-12/index.html @@ -53,7 +53,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less "/> - + diff --git a/public/2016-01/index.html b/public/2016-01/index.html index 55364f584..4161871ac 100644 --- a/public/2016-01/index.html +++ b/public/2016-01/index.html @@ -43,7 +43,7 @@ Update GitHub wiki for documentation of maintenance tasks. "/> - + diff --git a/public/2016-02/index.html b/public/2016-02/index.html index 56062a158..d70f63308 100644 --- a/public/2016-02/index.html +++ b/public/2016-02/index.html @@ -57,7 +57,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r "/> - + diff --git a/public/2016-03/index.html b/public/2016-03/index.html index 4fa01090d..7846fa6f8 100644 --- a/public/2016-03/index.html +++ b/public/2016-03/index.html @@ -43,7 +43,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja "/> - + diff --git a/public/2016-04/index.html b/public/2016-04/index.html index 138469d06..a7ba40f3f 100644 --- a/public/2016-04/index.html +++ b/public/2016-04/index.html @@ -47,7 +47,7 @@ Also, I noticed the checker log has some errors we should pay attention to: "/> - + diff --git a/public/2016-05/index.html b/public/2016-05/index.html index ad82dcad4..6e6a0bc69 100644 --- a/public/2016-05/index.html +++ b/public/2016-05/index.html @@ -51,7 +51,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period! "/> - + diff --git a/public/2016-06/index.html b/public/2016-06/index.html index 5b8413d0f..43d50f89f 100644 --- a/public/2016-06/index.html +++ b/public/2016-06/index.html @@ -49,7 +49,7 @@ Working on second phase of metadata migration, looks like this will work for mov "/> - + diff --git a/public/2016-07/index.html b/public/2016-07/index.html index ed02e2483..e72a75e61 100644 --- a/public/2016-07/index.html +++ b/public/2016-07/index.html @@ -65,7 +65,7 @@ In this case the select query was showing 95 results before the update "/> - + diff --git a/public/2016-08/index.html b/public/2016-08/index.html index 758c4a173..5c045bf1d 100644 --- a/public/2016-08/index.html +++ b/public/2016-08/index.html @@ -59,7 +59,7 @@ $ git rebase -i dspace-5.5 "/> - + diff --git a/public/2016-09/index.html b/public/2016-09/index.html index 5e8b24d7c..4dcb240db 100644 --- a/public/2016-09/index.html +++ b/public/2016-09/index.html @@ -51,7 +51,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or "/> - + diff --git a/public/2016-10/index.html b/public/2016-10/index.html index 7f31f633f..e3a10b591 100644 --- a/public/2016-10/index.html +++ b/public/2016-10/index.html @@ -59,7 +59,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id "/> - + diff --git a/public/2016-11/index.html b/public/2016-11/index.html index 060c805f5..ce75c7558 100644 --- a/public/2016-11/index.html +++ b/public/2016-11/index.html @@ -43,7 +43,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module "/> - + diff --git a/public/2016-12/index.html b/public/2016-12/index.html index 57b36d3bf..6a503f74c 100644 --- a/public/2016-12/index.html +++ b/public/2016-12/index.html @@ -67,7 +67,7 @@ Another worrying error from dspace.log is: "/> - + diff --git a/public/2017-01/index.html b/public/2017-01/index.html index 0996bcefa..af9b7b8b4 100644 --- a/public/2017-01/index.html +++ b/public/2017-01/index.html @@ -43,7 +43,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua "/> - + diff --git a/public/2017-02/index.html b/public/2017-02/index.html index 885fde4db..143f7d215 100644 --- a/public/2017-02/index.html +++ b/public/2017-02/index.html @@ -71,7 +71,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name "/> - + diff --git a/public/2017-03/index.html b/public/2017-03/index.html index f12795090..81af56a4d 100644 --- a/public/2017-03/index.html +++ b/public/2017-03/index.html @@ -75,7 +75,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg "/> - + diff --git a/public/2017-04/index.html b/public/2017-04/index.html index c7a95587e..694910eb6 100644 --- a/public/2017-04/index.html +++ b/public/2017-04/index.html @@ -61,7 +61,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th "/> - + diff --git a/public/2017-05/index.html b/public/2017-05/index.html index 747042b01..c2fab7e9d 100644 --- a/public/2017-05/index.html +++ b/public/2017-05/index.html @@ -27,7 +27,7 @@ - + diff --git a/public/2017-06/index.html b/public/2017-06/index.html index ac9587f7c..000244ef4 100644 --- a/public/2017-06/index.html +++ b/public/2017-06/index.html @@ -27,7 +27,7 @@ - + diff --git a/public/2017-07/index.html b/public/2017-07/index.html index 8bf518514..e1cef0dbb 100644 --- a/public/2017-07/index.html +++ b/public/2017-07/index.html @@ -55,7 +55,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the "/> - + diff --git a/public/2017-08/index.html b/public/2017-08/index.html index ae302533e..9581e6c57 100644 --- a/public/2017-08/index.html +++ b/public/2017-08/index.html @@ -37,7 +37,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s - + @@ -75,7 +75,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s "/> - + @@ -87,7 +87,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s "url": "https://alanorth.github.io/cgspace-notes/2017-08/", "wordCount": "3542", "datePublished": "2017-08-01T11:51:52+03:00", - "dateModified": "2017-09-10T18:17:25+03:00", + "dateModified": "2017-09-10T19:18:52+03:00", "author": { "@type": "Person", "name": "Alan Orth" diff --git a/public/2017-09/index.html b/public/2017-09/index.html index ade1b0c36..98ac7307d 100644 --- a/public/2017-09/index.html +++ b/public/2017-09/index.html @@ -39,7 +39,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two "/> - + @@ -49,7 +49,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two "@type": "BlogPosting", "headline": "September, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-09/", - "wordCount": "455", + "wordCount": "903", "datePublished": "2017-09-07T16:54:52+07:00", "dateModified": "2017-09-10T18:21:38+03:00", "author": { @@ -173,6 +173,51 @@ dspace.log.2017-09-10:0
  • I’m expecting to see 0 connection errors for the next few months
  • +

    2017-09-11

    + + + +

    2017-09-12

    + + + +
    $ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
    +
    + + + +
    09:39:36.008956 IP 192.168.8.124.50515 > 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433&os-name=Mac+OS+X&jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&jvm-version=1.8.0_144&platform=x86_64&tc-version=UNKNOWN&tc-product=Ehcache+Core+1.7.2&source=Ehcache+Core&uptime-secs=0&patch=UNKNOWN HTTP/1.1
    +
    + + + diff --git a/public/index.html b/public/index.html index 22450be35..dc22d0c11 100644 --- a/public/index.html +++ b/public/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/page/2/index.html b/public/page/2/index.html index aa3254bb0..4264398ab 100644 --- a/public/page/2/index.html +++ b/public/page/2/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/page/3/index.html b/public/page/3/index.html index 515313c47..bcd994235 100644 --- a/public/page/3/index.html +++ b/public/page/3/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/post/index.html b/public/post/index.html index c63fcb8e5..3b5d28aa1 100644 --- a/public/post/index.html +++ b/public/post/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/post/page/2/index.html b/public/post/page/2/index.html index e47a3399a..a7373d257 100644 --- a/public/post/page/2/index.html +++ b/public/post/page/2/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/post/page/3/index.html b/public/post/page/3/index.html index 491d50261..2f79e52ca 100644 --- a/public/post/page/3/index.html +++ b/public/post/page/3/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/sitemap.xml b/public/sitemap.xml index 21f71223d..afd6ab907 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -9,7 +9,7 @@ https://alanorth.github.io/cgspace-notes/2017-08/ - 2017-09-10T18:17:25+03:00 + 2017-09-10T19:18:52+03:00 diff --git a/public/tags/notes/index.html b/public/tags/notes/index.html index da7f3a0ca..51dc5052f 100644 --- a/public/tags/notes/index.html +++ b/public/tags/notes/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/tags/notes/page/2/index.html b/public/tags/notes/page/2/index.html index 0719a984a..c604cead9 100644 --- a/public/tags/notes/page/2/index.html +++ b/public/tags/notes/page/2/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/tags/notes/page/3/index.html b/public/tags/notes/page/3/index.html index d2dd43769..4d7a55143 100644 --- a/public/tags/notes/page/3/index.html +++ b/public/tags/notes/page/3/index.html @@ -25,7 +25,7 @@ - +