Add notes for 2017-09-12

This commit is contained in:
Alan Orth 2017-09-12 16:57:19 +03:00
parent bd115f81a5
commit 96b6e63a46
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
34 changed files with 117 additions and 36 deletions

View File

@ -55,3 +55,39 @@ dspace.log.2017-09-10:0
- I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we're currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system's PostgreSQL `max_connections` (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)
- I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)
- I'm expecting to see 0 connection errors for the next few months
## 2017-09-11
- Lots of work testing the CGIAR Library migration
- Many technical notes and TODOs here: https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c
## 2017-09-12
- I was testing the [METS XSD caching during AIP ingest](https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating) but it doesn't seem to help actually
- The import process takes the same amount of time with and without the caching
- Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):
```
$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
```
- Great TCP dump guide here: https://danielmiessler.com/study/tcpdump
- The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation
- I sent a message to the mailing list to see if anyone knows more about this
- In looking at the tcpdump results I notice that there is an update check to the ehcache server on _every_ iteration of the ingest loop, for example:
```
09:39:36.008956 IP 192.168.8.124.50515 > 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433&os-name=Mac+OS+X&jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&jvm-version=1.8.0_144&platform=x86_64&tc-version=UNKNOWN&tc-product=Ehcache+Core+1.7.2&source=Ehcache+Core&uptime-secs=0&patch=UNKNOWN HTTP/1.1
```
- Turns out this is a known issue and Ehcache has refused to make it opt-in: https://jira.terracotta.org/jira/browse/EHC-461
- But we can disable it by adding an `updateCheck="false"` attribute to the main `<ehcache >` tag in `dspace-services/src/main/resources/caching/ehcache-config.xml`
- After re-compiling and re-deploying DSpace I no longer see those update checks during item submission
- I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
- First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name
- The logic is that searching by name actually isn't very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names
- Atmire's proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)
- Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field
- Ideally there could also be a user interface for cleanup and merging of authorities
- He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release
- As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us

View File

@ -51,7 +51,7 @@ $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspac
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -53,7 +53,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -43,7 +43,7 @@ Update GitHub wiki for documentation of maintenance tasks.
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -57,7 +57,7 @@ Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&r
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -43,7 +43,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -47,7 +47,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -51,7 +51,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -49,7 +49,7 @@ Working on second phase of metadata migration, looks like this will work for mov
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -65,7 +65,7 @@ In this case the select query was showing 95 results before the update
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -59,7 +59,7 @@ $ git rebase -i dspace-5.5
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -51,7 +51,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=or
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -59,7 +59,7 @@ I exported a random item&rsquo;s metadata as CSV, deleted all columns except id
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -43,7 +43,7 @@ Add dc.type to the output options for Atmire&rsquo;s Listings and Reports module
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -67,7 +67,7 @@ Another worrying error from dspace.log is:
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -43,7 +43,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -71,7 +71,7 @@ Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -75,7 +75,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -61,7 +61,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -27,7 +27,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="May, 2017"/>
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -27,7 +27,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="June, 2017"/>
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -55,7 +55,7 @@ We can use PostgreSQL&rsquo;s extended output format (-x) plus sed to format the
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -37,7 +37,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<meta property="article:published_time" content="2017-08-01T11:51:52&#43;03:00"/>
<meta property="article:modified_time" content="2017-09-10T18:17:25&#43;03:00"/>
<meta property="article:modified_time" content="2017-09-10T19:18:52&#43;03:00"/>
@ -75,7 +75,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />
@ -87,7 +87,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
"url": "https://alanorth.github.io/cgspace-notes/2017-08/",
"wordCount": "3542",
"datePublished": "2017-08-01T11:51:52&#43;03:00",
"dateModified": "2017-09-10T18:17:25&#43;03:00",
"dateModified": "2017-09-10T19:18:52&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"

View File

@ -39,7 +39,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
"/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />
@ -49,7 +49,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
"@type": "BlogPosting",
"headline": "September, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-09/",
"wordCount": "455",
"wordCount": "903",
"datePublished": "2017-09-07T16:54:52&#43;07:00",
"dateModified": "2017-09-10T18:21:38&#43;03:00",
"author": {
@ -173,6 +173,51 @@ dspace.log.2017-09-10:0
<li>I&rsquo;m expecting to see 0 connection errors for the next few months</li>
</ul>
<h2 id="2017-09-11">2017-09-11</h2>
<ul>
<li>Lots of work testing the CGIAR Library migration</li>
<li>Many technical notes and TODOs here: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
</ul>
<h2 id="2017-09-12">2017-09-12</h2>
<ul>
<li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn&rsquo;t seem to help actually</li>
<li>The import process takes the same amount of time with and without the caching</li>
<li>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</li>
</ul>
<pre><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
</code></pre>
<ul>
<li>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></li>
<li>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</li>
<li>I sent a message to the mailing list to see if anyone knows more about this</li>
<li>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</li>
</ul>
<pre><code>09:39:36.008956 IP 192.168.8.124.50515 &gt; 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&amp;pageID=update.properties&amp;id=2130706433&amp;os-name=Mac+OS+X&amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;jvm-version=1.8.0_144&amp;platform=x86_64&amp;tc-version=UNKNOWN&amp;tc-product=Ehcache+Core+1.7.2&amp;source=Ehcache+Core&amp;uptime-secs=0&amp;patch=UNKNOWN HTTP/1.1
</code></pre>
<ul>
<li>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></li>
<li>But we can disable it by adding an <code>updateCheck=&quot;false&quot;</code> attribute to the main <code>&lt;ehcache &gt;</code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></li>
<li>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</li>
<li>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
<ul>
<li>First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name</li>
<li>The logic is that searching by name actually isn&rsquo;t very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names</li>
<li>Atmire&rsquo;s proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)</li>
<li>Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field</li>
<li>Ideally there could also be a user interface for cleanup and merging of authorities</li>
<li>He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release</li>
<li>As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us</li>
</ul></li>
</ul>

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -9,7 +9,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2017-08/</loc>
<lastmod>2017-09-10T18:17:25+03:00</lastmod>
<lastmod>2017-09-10T19:18:52+03:00</lastmod>
</url>
<url>

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />

View File

@ -25,7 +25,7 @@
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.26" />
<meta name="generator" content="Hugo 0.27" />