Add notes for 2019-11-28

This commit is contained in:
2019-11-28 17:30:45 +02:00
parent 1f2be05583
commit 6bae7849e6
90 changed files with 14955 additions and 21478 deletions

View File

@ -444,7 +444,7 @@ Buck/2.2; (+https://app.hypefactors.com/media-monitoring/about.html)
## 2019-11-26 ## 2019-11-26
- Visit CodeObie to discuss future of OpenRXV and AReS - Visit CodeObia to discuss future of OpenRXV and AReS
- I started working on categorizing and validating the feedback that Jane collated into a spreadsheet last week - I started working on categorizing and validating the feedback that Jane collated into a spreadsheet last week
- I added GitHub issues for eight of the items so far, tagging them by "bug", "search", "feature", "graphics", "low-priority", etc - I added GitHub issues for eight of the items so far, tagging them by "bug", "search", "feature", "graphics", "low-priority", etc
- I moved AReS v2 to be available on CGSpace - I moved AReS v2 to be available on CGSpace
@ -465,4 +465,12 @@ Buck/2.2; (+https://app.hypefactors.com/media-monitoring/about.html)
- I need to ask Marie-Angelique about the `cg.peer-reviewed` field - I need to ask Marie-Angelique about the `cg.peer-reviewed` field
- We currently use `dc.description.version` with values like "Internal Review" and "Peer Review", and CG Core v2 currently recommends using "True" if the field is peer reviewed - We currently use `dc.description.version` with values like "Internal Review" and "Peer Review", and CG Core v2 currently recommends using "True" if the field is peer reviewed
## 2019-11-28
- File an issue with CG Core v2 project to ask Marie-Angelique about expanding the scope of `cg.peer-reviewed` to include other types of review, and possibly to change the field name to something more generic like `cg.review-status` ([#14](https://github.com/AgriculturalSemantics/cg-core/issues/14))
- More review of AReS feedback
- I clarified some of the feedback
- I added status of "Issue Filed", "Duplicate" and "No Action Required" to several items
- I filed a handful more GitHub issues in AReS and OpenRXV GitHub trackers
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -8,15 +8,12 @@
<meta property="og:title" content="November, 2015" /> <meta property="og:title" content="November, 2015" />
<meta property="og:description" content="2015-11-22 <meta property="og:description" content="2015-11-22
CGSpace went down CGSpace went down
Looks like DSpace exhausted its PostgreSQL connection pool Looks like DSpace exhausted its PostgreSQL connection pool
Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections: Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78 78
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-11/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-11/" />
@ -27,17 +24,14 @@ $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspac
<meta name="twitter:title" content="November, 2015"/> <meta name="twitter:title" content="November, 2015"/>
<meta name="twitter:description" content="2015-11-22 <meta name="twitter:description" content="2015-11-22
CGSpace went down CGSpace went down
Looks like DSpace exhausted its PostgreSQL connection pool Looks like DSpace exhausted its PostgreSQL connection pool
Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections: Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78 78
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,147 +112,107 @@ $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspac
</p> </p>
</header> </header>
<h2 id="2015-11-22">2015-11-22</h2> <h2 id="20151122">2015-11-22</h2>
<ul> <ul>
<li>CGSpace went down</li> <li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> <li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78 78
</code></pre></li> </code></pre><ul>
</ul>
<ul>
<li>For now I have increased the limit from 60 to 90, run updates, and rebooted the server</li> <li>For now I have increased the limit from 60 to 90, run updates, and rebooted the server</li>
</ul> </ul>
<h2 id="20151124">2015-11-24</h2>
<h2 id="2015-11-24">2015-11-24</h2>
<ul> <ul>
<li>CGSpace went down again</li> <li>CGSpace went down again</li>
<li>Getting emails from uptimeRobot and uptimeButler that it&rsquo;s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors</li> <li>Getting emails from uptimeRobot and uptimeButler that it's down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors</li>
<li>Looks like there are still a bunch of idle PostgreSQL connections:</li>
<li><p>Looks like there are still a bunch of idle PostgreSQL connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
96 96
</code></pre></li> </code></pre><ul>
<li>For some reason the number of idle connections is very high since we upgraded to DSpace 5</li>
<li><p>For some reason the number of idle connections is very high since we upgraded to DSpace 5</p></li>
</ul> </ul>
<h2 id="20151125">2015-11-25</h2>
<h2 id="2015-11-25">2015-11-25</h2>
<ul> <ul>
<li>Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config</li> <li>Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config</li>
<li>The OAI application requests stylesheets and javascript files with the path <code>/oai/static/css</code>, which gets matched here:</li>
<li><p>The OAI application requests stylesheets and javascript files with the path <code>/oai/static/css</code>, which gets matched here:</p> </ul>
<pre><code># static assets we can load from the file system directly with nginx <pre><code># static assets we can load from the file system directly with nginx
location ~ /(themes|static|aspects/ReportingSuite) { location ~ /(themes|static|aspects/ReportingSuite) {
try_files $uri @tomcat; try_files $uri @tomcat;
... ...
</code></pre></li> </code></pre><ul>
<li>The document root is relative to the xmlui app, so this gets a 404—I'm not sure why it doesn't pass to <code>@tomcat</code></li>
<li><p>The document root is relative to the xmlui app, so this gets a 404—I&rsquo;m not sure why it doesn&rsquo;t pass to <code>@tomcat</code></p></li> <li>Anyways, I can't find any URIs with path <code>/static</code>, and the more important point is to handle all the static theme assets, so we can just remove <code>static</code> from the regex for now (who cares if we can't use nginx to send Etags for OAI CSS!)</li>
<li>Also, I noticed we aren't setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use <code>add_header</code> in a child block it doesn't inherit the others</li>
<li><p>Anyways, I can&rsquo;t find any URIs with path <code>/static</code>, and the more important point is to handle all the static theme assets, so we can just remove <code>static</code> from the regex for now (who cares if we can&rsquo;t use nginx to send Etags for OAI CSS!)</p></li> <li>We simply need to add <code>include extra-security.conf;</code> to the above location block (but research and test first)</li>
<li>We should add WOFF assets to the list of things to set expires for:</li>
<li><p>Also, I noticed we aren&rsquo;t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use <code>add_header</code> in a child block it doesn&rsquo;t inherit the others</p></li> </ul>
<li><p>We simply need to add <code>include extra-security.conf;</code> to the above location block (but research and test first)</p></li>
<li><p>We should add WOFF assets to the list of things to set expires for:</p>
<pre><code>location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ { <pre><code>location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
</code></pre></li> </code></pre><ul>
<li>We should also add <code>aspects/Statistics</code> to the location block for static assets (minus <code>static</code> from above):</li>
<li><p>We should also add <code>aspects/Statistics</code> to the location block for static assets (minus <code>static</code> from above):</p> </ul>
<pre><code>location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) { <pre><code>location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
</code></pre></li> </code></pre><ul>
<li>Need to check <code>/about</code> on CGSpace, as it's blank on my local test server and we might need to add something there</li>
<li><p>Need to check <code>/about</code> on CGSpace, as it&rsquo;s blank on my local test server and we might need to add something there</p></li> <li>CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):</li>
</ul>
<li><p>CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):</p>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
93 93
</code></pre></li> </code></pre><ul>
<li>I looked closer at the idle connections and saw that many have been idle for hours (current time on server is <code>2015-11-25T20:20:42+0000</code>):</li>
<li><p>I looked closer at the idle connections and saw that many have been idle for hours (current time on server is <code>2015-11-25T20:20:42+0000</code>):</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | less -S <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | less -S
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start |
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+--- -------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
20951 | cgspace | 10966 | 18205 | cgspace | | 127.0.0.1 | | 37731 | 2015-11-25 13:13:02.837624+00 | | 20 20951 | cgspace | 10966 | 18205 | cgspace | | 127.0.0.1 | | 37731 | 2015-11-25 13:13:02.837624+00 | | 20
20951 | cgspace | 10967 | 18205 | cgspace | | 127.0.0.1 | | 37737 | 2015-11-25 13:13:03.069421+00 | | 20 20951 | cgspace | 10967 | 18205 | cgspace | | 127.0.0.1 | | 37737 | 2015-11-25 13:13:03.069421+00 | | 20
... ...
</code></pre></li> </code></pre><ul>
<li>There is a relevant Jira issue about this: <a href="https://jira.duraspace.org/browse/DS-1458">https://jira.duraspace.org/browse/DS-1458</a></li>
<li><p>There is a relevant Jira issue about this: <a href="https://jira.duraspace.org/browse/DS-1458">https://jira.duraspace.org/browse/DS-1458</a></p></li> <li>It seems there is some sense changing DSpace's default <code>db.maxidle</code> from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)</li>
<li>Change <code>db.maxidle</code> from -1 to 10, reduce <code>db.maxconnections</code> from 90 to 50, and restart postgres and tomcat7</li>
<li><p>It seems there is some sense changing DSpace&rsquo;s default <code>db.maxidle</code> from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)</p></li> <li>Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well</li>
<li>Also deploy the nginx fixes for the <code>try_files</code> location block as well as the expires block</li>
<li><p>Change <code>db.maxidle</code> from -1 to 10, reduce <code>db.maxconnections</code> from 90 to 50, and restart postgres and tomcat7</p></li>
<li><p>Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well</p></li>
<li><p>Also deploy the nginx fixes for the <code>try_files</code> location block as well as the expires block</p></li>
</ul> </ul>
<h2 id="20151126">2015-11-26</h2>
<h2 id="2015-11-26">2015-11-26</h2>
<ul> <ul>
<li>CGSpace behaving much better since changing <code>db.maxidle</code> yesterday, but still two up/down notices from monitoring this morning (better than 50!)</li> <li>CGSpace behaving much better since changing <code>db.maxidle</code> yesterday, but still two up/down notices from monitoring this morning (better than 50!)</li>
<li>CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item</li> <li>CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item</li>
<li>Not as bad for me, but still unsustainable if you have to get many:</li>
<li><p>Not as bad for me, but still unsustainable if you have to get many:</p> </ul>
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all <pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
8.415 8.415
</code></pre></li> </code></pre><ul>
<li>Monitoring e-mailed in the evening to say CGSpace was down</li>
<li><p>Monitoring e-mailed in the evening to say CGSpace was down</p></li> <li>Idle connections in PostgreSQL again:</li>
</ul>
<li><p>Idle connections in PostgreSQL again:</p>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
66 66
</code></pre></li> </code></pre><ul>
<li>At the time, the current DSpace pool size was 50&hellip;</li>
<li><p>At the time, the current DSpace pool size was 50&hellip;</p></li> <li>I reduced the pool back to the default of 30, and reduced the <code>db.maxidle</code> settings from 10 to 8</li>
<li><p>I reduced the pool back to the default of 30, and reduced the <code>db.maxidle</code> settings from 10 to 8</p></li>
</ul> </ul>
<h2 id="20151129">2015-11-29</h2>
<h2 id="2015-11-29">2015-11-29</h2>
<ul> <ul>
<li>Still more alerts that CGSpace has been up and down all day</li> <li>Still more alerts that CGSpace has been up and down all day</li>
<li>Current database settings for DSpace:</li>
<li><p>Current database settings for DSpace:</p> </ul>
<pre><code>db.maxconnections = 30 <pre><code>db.maxconnections = 30
db.maxwait = 5000 db.maxwait = 5000
db.maxidle = 8 db.maxidle = 8
db.statementpool = true db.statementpool = true
</code></pre></li> </code></pre><ul>
<li>And idle connections:</li>
<li><p>And idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
49 49
</code></pre></li> </code></pre><ul>
<li>Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace's thirst can ever be quenched</li>
<li><p>Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace&rsquo;s thirst can ever be quenched</p></li> <li>On another note, SUNScholar's notes suggest adjusting some other postgres variables: <a href="http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database">http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database</a></li>
<li>This might help with REST API speed (which I mentioned above and still need to do real tests)</li>
<li><p>On another note, SUNScholar&rsquo;s notes suggest adjusting some other postgres variables: <a href="http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database">http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database</a></p></li>
<li><p>This might help with REST API speed (which I mentioned above and still need to do real tests)</p></li>
</ul> </ul>

View File

@ -8,7 +8,6 @@
<meta property="og:title" content="December, 2015" /> <meta property="og:title" content="December, 2015" />
<meta property="og:description" content="2015-12-02 <meta property="og:description" content="2015-12-02
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space: Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log # cd /home/dspacetest.cgiar.org/log
@ -16,7 +15,6 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-12/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-12/" />
@ -27,7 +25,6 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
<meta name="twitter:title" content="December, 2015"/> <meta name="twitter:title" content="December, 2015"/>
<meta name="twitter:description" content="2015-12-02 <meta name="twitter:description" content="2015-12-02
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space: Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log # cd /home/dspacetest.cgiar.org/log
@ -35,9 +32,8 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,42 +114,34 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
</p> </p>
</header> </header>
<h2 id="2015-12-02">2015-12-02</h2> <h2 id="20151202">2015-12-02</h2>
<ul> <ul>
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log <pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre></li> </code></pre><ul>
</ul>
<ul>
<li>I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper</li> <li>I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper</li>
<li>Need to remember to go check if everything is ok in a few days and then change CGSpace</li> <li>Need to remember to go check if everything is ok in a few days and then change CGSpace</li>
<li>CGSpace went down again (due to PostgreSQL idle connections of course)</li> <li>CGSpace went down again (due to PostgreSQL idle connections of course)</li>
<li>Current database settings for DSpace are <code>db.maxconnections = 30</code> and <code>db.maxidle = 8</code>, yet idle connections are exceeding this:</li>
<li><p>Current database settings for DSpace are <code>db.maxconnections = 30</code> and <code>db.maxidle = 8</code>, yet idle connections are exceeding this:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
39 39
</code></pre></li> </code></pre><ul>
<li>I restarted PostgreSQL and Tomcat and it's back</li>
<li><p>I restarted PostgreSQL and Tomcat and it&rsquo;s back</p></li> <li>On a related note of why CGSpace is so slow, I decided to finally try the <code>pgtune</code> script to tune the postgres settings:</li>
</ul>
<li><p>On a related note of why CGSpace is so slow, I decided to finally try the <code>pgtune</code> script to tune the postgres settings:</p>
<pre><code># apt-get install pgtune <pre><code># apt-get install pgtune
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune # pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig # mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf # mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
</code></pre></li> </code></pre><ul>
<li>It introduced the following new settings:</li>
<li><p>It introduced the following new settings:</p> </ul>
<pre><code>default_statistics_target = 50 <pre><code>default_statistics_target = 50
maintenance_work_mem = 480MB maintenance_work_mem = 480MB
constraint_exclusion = on constraint_exclusion = on
@ -164,12 +152,10 @@ wal_buffers = 8MB
checkpoint_segments = 16 checkpoint_segments = 16
shared_buffers = 1920MB shared_buffers = 1920MB
max_connections = 80 max_connections = 80
</code></pre></li> </code></pre><ul>
<li>Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc</li>
<li><p>Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc</p></li> <li>For what it's worth, now the REST API should be faster (because of these PostgreSQL tweaks):</li>
</ul>
<li><p>For what it&rsquo;s worth, now the REST API should be faster (because of these PostgreSQL tweaks):</p>
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all <pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.474 1.474
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
@ -180,40 +166,29 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
1.995 1.995
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.786 1.786
</code></pre></li> </code></pre><ul>
<li>Last week it was an average of 8 seconds&hellip; now this is 1/4 of that</li>
<li><p>Last week it was an average of 8 seconds&hellip; now this is <sup>1</sup>&frasl;<sub>4</sub> of that</p></li> <li>CCAFS noticed that one of their items displays only the Atmire statlets: <a href="https://cgspace.cgiar.org/handle/10568/42445">https://cgspace.cgiar.org/handle/10568/42445</a></li>
<li><p>CCAFS noticed that one of their items displays only the Atmire statlets: <a href="https://cgspace.cgiar.org/handle/10568/42445">https://cgspace.cgiar.org/handle/10568/42445</a></p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2015/12/ccafs-item-no-metadata.png" alt="CCAFS item"></p>
<p><img src="/cgspace-notes/2015/12/ccafs-item-no-metadata.png" alt="CCAFS item" /></p>
<ul> <ul>
<li>The authorizations for the item are all public READ, and I don&rsquo;t see any errors in dspace.log when browsing that item</li> <li>The authorizations for the item are all public READ, and I don't see any errors in dspace.log when browsing that item</li>
<li>I filed a ticket on Atmire&rsquo;s issue tracker</li> <li>I filed a ticket on Atmire's issue tracker</li>
<li>I also filed a ticket on Atmire&rsquo;s issue tracker for the PostgreSQL stuff</li> <li>I also filed a ticket on Atmire's issue tracker for the PostgreSQL stuff</li>
</ul> </ul>
<h2 id="20151203">2015-12-03</h2>
<h2 id="2015-12-03">2015-12-03</h2>
<ul> <ul>
<li>CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)</li> <li>CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)</li>
<li>Idle postgres connections look like this (with no change in DSpace db settings lately):</li>
<li><p>Idle postgres connections look like this (with no change in DSpace db settings lately):</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
29 29
</code></pre></li> </code></pre><ul>
<li>I restarted Tomcat and postgres&hellip;</li>
<li><p>I restarted Tomcat and postgres&hellip;</p></li> <li>Atmire commented that we should raise the JVM heap size by ~500M, so it is now <code>-Xms3584m -Xmx3584m</code></li>
<li>We weren't out of heap yet, but it's probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it's ok</li>
<li><p>Atmire commented that we should raise the JVM heap size by ~500M, so it is now <code>-Xms3584m -Xmx3584m</code></p></li> <li>A possible side effect is that I see that the REST API is twice as fast for the request above now:</li>
</ul>
<li><p>We weren&rsquo;t out of heap yet, but it&rsquo;s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it&rsquo;s ok</p></li>
<li><p>A possible side effect is that I see that the REST API is twice as fast for the request above now:</p>
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all <pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.368 1.368
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
@ -226,37 +201,26 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
0.806 0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854 0.854
</code></pre></li> </code></pre><h2 id="20151205">2015-12-05</h2>
</ul>
<h2 id="2015-12-05">2015-12-05</h2>
<ul> <ul>
<li>CGSpace has been up and down all day and REST API is completely unresponsive</li> <li>CGSpace has been up and down all day and REST API is completely unresponsive</li>
<li>PostgreSQL idle connections are currently:</li>
<li><p>PostgreSQL idle connections are currently:</p> </ul>
<pre><code>postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle <pre><code>postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
28 28
</code></pre></li> </code></pre><ul>
<li>I have reverted all the pgtune tweaks from the other day, as they didn't fix the stability issues, so I'd rather not have them introducing more variables into the equation</li>
<li><p>I have reverted all the pgtune tweaks from the other day, as they didn&rsquo;t fix the stability issues, so I&rsquo;d rather not have them introducing more variables into the equation</p></li> <li>The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around midlate November</li>
<li><p>The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around midlate November</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2015/12/postgres_bgwriter-year.png" alt="PostgreSQL bgwriter (year)">
<p><img src="/cgspace-notes/2015/12/postgres_bgwriter-year.png" alt="PostgreSQL bgwriter (year)" /> <img src="/cgspace-notes/2015/12/postgres_cache_cgspace-year.png" alt="PostgreSQL cache (year)">
<img src="/cgspace-notes/2015/12/postgres_cache_cgspace-year.png" alt="PostgreSQL cache (year)" /> <img src="/cgspace-notes/2015/12/postgres_locks_cgspace-year.png" alt="PostgreSQL locks (year)">
<img src="/cgspace-notes/2015/12/postgres_locks_cgspace-year.png" alt="PostgreSQL locks (year)" /> <img src="/cgspace-notes/2015/12/postgres_scans_cgspace-year.png" alt="PostgreSQL scans (year)"></p>
<img src="/cgspace-notes/2015/12/postgres_scans_cgspace-year.png" alt="PostgreSQL scans (year)" /></p> <h2 id="20151207">2015-12-07</h2>
<h2 id="2015-12-07">2015-12-07</h2>
<ul> <ul>
<li>Atmire sent <a href="https://github.com/ilri/DSpace/pull/161">some fixes</a> to DSpace&rsquo;s REST API code that was leaving contexts open (causing the slow performance and database issues)</li> <li>Atmire sent <a href="https://github.com/ilri/DSpace/pull/161">some fixes</a> to DSpace's REST API code that was leaving contexts open (causing the slow performance and database issues)</li>
<li>After deploying the fix to CGSpace the REST API is consistently faster:</li>
<li><p>After deploying the fix to CGSpace the REST API is consistently faster:</p> </ul>
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all <pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.675 0.675
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
@ -267,14 +231,10 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
0.566 0.566
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.497 0.497
</code></pre></li> </code></pre><h2 id="20151208">2015-12-08</h2>
</ul>
<h2 id="2015-12-08">2015-12-08</h2>
<ul> <ul>
<li>Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn&rsquo;t as good, but it&rsquo;s much faster and causes less IO/CPU load</li> <li>Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn't as good, but it's much faster and causes less IO/CPU load</li>
<li>Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot&rsquo;s crawl rate to the &ldquo;Let Google optimize&rdquo; setting</li> <li>Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot's crawl rate to the &ldquo;Let Google optimize&rdquo; setting</li>
</ul> </ul>

View File

@ -8,7 +8,6 @@
<meta property="og:title" content="January, 2016" /> <meta property="og:title" content="January, 2016" />
<meta property="og:description" content="2016-01-13 <meta property="og:description" content="2016-01-13
Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year. Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
Update GitHub wiki for documentation of maintenance tasks. Update GitHub wiki for documentation of maintenance tasks.
@ -22,12 +21,11 @@ Update GitHub wiki for documentation of maintenance tasks.
<meta name="twitter:title" content="January, 2016"/> <meta name="twitter:title" content="January, 2016"/>
<meta name="twitter:description" content="2016-01-13 <meta name="twitter:description" content="2016-01-13
Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year. Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
Update GitHub wiki for documentation of maintenance tasks. Update GitHub wiki for documentation of maintenance tasks.
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -108,90 +106,72 @@ Update GitHub wiki for documentation of maintenance tasks.
</p> </p>
</header> </header>
<h2 id="2016-01-13">2016-01-13</h2> <h2 id="20160113">2016-01-13</h2>
<ul> <ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> <li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
</ul> </ul>
<h2 id="20160114">2016-01-14</h2>
<h2 id="2016-01-14">2016-01-14</h2>
<ul> <ul>
<li>Update CCAFS project identifiers in input-forms.xml</li> <li>Update CCAFS project identifiers in input-forms.xml</li>
<li>Run system updates and restart the server</li> <li>Run system updates and restart the server</li>
</ul> </ul>
<h2 id="20160118">2016-01-18</h2>
<h2 id="2016-01-18">2016-01-18</h2>
<ul> <ul>
<li>Change &ldquo;Extension material&rdquo; to &ldquo;Extension Material&rdquo; in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era)</li> <li>Change &ldquo;Extension material&rdquo; to &ldquo;Extension Material&rdquo; in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era)</li>
</ul> </ul>
<h2 id="20160119">2016-01-19</h2>
<h2 id="2016-01-19">2016-01-19</h2>
<ul> <ul>
<li>Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme&rsquo;s primary color (<a href="https://github.com/ilri/DSpace/issues/157">#157</a>)</li> <li>Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme's primary color (<a href="https://github.com/ilri/DSpace/issues/157">#157</a>)</li>
<li>Tweak date-based facets to show more values in drill-down ranges (<a href="https://github.com/ilri/DSpace/issues/162">#162</a>)</li> <li>Tweak date-based facets to show more values in drill-down ranges (<a href="https://github.com/ilri/DSpace/issues/162">#162</a>)</li>
<li>Need to remember to clear the Cocoon cache after deployment or else you don&rsquo;t see the new ranges immediately</li> <li>Need to remember to clear the Cocoon cache after deployment or else you don't see the new ranges immediately</li>
<li>Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account</li> <li>Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account</li>
<li>Altmetrics&rsquo; support for Handles is kinda weak, so they can&rsquo;t associate our items with DOIs until they are tweeted or blogged, etc first.</li> <li>Altmetrics&rsquo; support for Handles is kinda weak, so they can't associate our items with DOIs until they are tweeted or blogged, etc first.</li>
</ul> </ul>
<h2 id="20160121">2016-01-21</h2>
<h2 id="2016-01-21">2016-01-21</h2>
<ul> <ul>
<li>Still waiting for my IFTTT recipe to fire, two days later</li> <li>Still waiting for my IFTTT recipe to fire, two days later</li>
<li>It looks like the Atom feed on CGSpace hasn&rsquo;t changed in two days, but there have definitely been new items</li> <li>It looks like the Atom feed on CGSpace hasn't changed in two days, but there have definitely been new items</li>
<li>The RSS feed is nearly as old, but has different old items there</li> <li>The RSS feed is nearly as old, but has different old items there</li>
<li>On a hunch I cleared the Cocoon cache and now the feeds are fresh</li> <li>On a hunch I cleared the Cocoon cache and now the feeds are fresh</li>
<li>Looks like there is configuration option related to this, <code>webui.feed.cache.age</code>, which defaults to 48 hours, though I&rsquo;m not sure what relation it has to the Cocoon cache</li> <li>Looks like there is configuration option related to this, <code>webui.feed.cache.age</code>, which defaults to 48 hours, though I'm not sure what relation it has to the Cocoon cache</li>
<li>In any case, we should change this cache to be something more like 6 hours, as we publish new items several times per day.</li> <li>In any case, we should change this cache to be something more like 6 hours, as we publish new items several times per day.</li>
<li>Work around a CSS issue with long URLs in the item view (<a href="https://github.com/ilri/DSpace/issues/172">#172</a>)</li> <li>Work around a CSS issue with long URLs in the item view (<a href="https://github.com/ilri/DSpace/issues/172">#172</a>)</li>
</ul> </ul>
<h2 id="20160125">2016-01-25</h2>
<h2 id="2016-01-25">2016-01-25</h2>
<ul> <ul>
<li>Re-deploy CGSpace and DSpace Test with latest <code>5_x-prod</code> branch</li> <li>Re-deploy CGSpace and DSpace Test with latest <code>5_x-prod</code> branch</li>
<li>This included the social icon fixes/updates, date-based facet tweaks, reducing the feed cache age, and fixing a layout issue in XMLUI item view when an item had long URLs</li> <li>This included the social icon fixes/updates, date-based facet tweaks, reducing the feed cache age, and fixing a layout issue in XMLUI item view when an item had long URLs</li>
</ul> </ul>
<h2 id="20160126">2016-01-26</h2>
<h2 id="2016-01-26">2016-01-26</h2>
<ul> <ul>
<li>Run nginx updates on CGSpace and DSpace Test (<a href="http://mailman.nginx.org/pipermail/nginx/2016-January/049700.html">1.8.1 and 1.9.10, respectively</a>)</li> <li>Run nginx updates on CGSpace and DSpace Test (<a href="http://mailman.nginx.org/pipermail/nginx/2016-January/049700.html">1.8.1 and 1.9.10, respectively</a>)</li>
<li>Run updates on DSpace Test and reboot for new Linode kernel <code>Linux 4.4.0-x86_64-linode63</code> (first update in months)</li> <li>Run updates on DSpace Test and reboot for new Linode kernel <code>Linux 4.4.0-x86_64-linode63</code> (first update in months)</li>
</ul> </ul>
<h2 id="20160128">2016-01-28</h2>
<h2 id="2016-01-28">2016-01-28</h2>
<ul> <ul>
<li>Start looking at importing some Bioversity data that had been prepared earlier this week</li> <li>
<p>Start looking at importing some Bioversity data that had been prepared earlier this week</p>
<li><p>While checking the data I noticed something strange, there are 79 items but only 8 unique PDFs:</p> </li>
<li>
<p>While checking the data I noticed something strange, there are 79 items but only 8 unique PDFs:</p>
<p>$ ls SimpleArchiveForBio/ | wc -l <p>$ ls SimpleArchiveForBio/ | wc -l
79 79
$ find SimpleArchiveForBio/ -iname &ldquo;*.pdf&rdquo; -exec basename {} \; | sort -u | wc -l $ find SimpleArchiveForBio/ -iname &ldquo;*.pdf&rdquo; -exec basename {} ; | sort -u | wc -l
8</p></li> 8</p>
</li>
</ul> </ul>
<h2 id="20160129">2016-01-29</h2>
<h2 id="2016-01-29">2016-01-29</h2>
<ul> <ul>
<li>Add five missing center-specific subjects to XMLUI item view (<a href="https://github.com/ilri/DSpace/issues/174">#174</a>)</li> <li>Add five missing center-specific subjects to XMLUI item view (<a href="https://github.com/ilri/DSpace/issues/174">#174</a>)</li>
<li>This <a href="https://cgspace.cgiar.org/handle/10568/67062">CCAFS item</a> Before:</li> <li>This <a href="https://cgspace.cgiar.org/handle/10568/67062">CCAFS item</a> Before:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/01/xmlui-subjects-before.png" alt="XMLUI subjects before"></p>
<p><img src="/cgspace-notes/2016/01/xmlui-subjects-before.png" alt="XMLUI subjects before" /></p>
<ul> <ul>
<li>After:</li> <li>After:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/01/xmlui-subjects-after.png" alt="XMLUI subjects after"></p>
<p><img src="/cgspace-notes/2016/01/xmlui-subjects-after.png" alt="XMLUI subjects after" /></p>

View File

@ -8,15 +8,12 @@
<meta property="og:title" content="February, 2016" /> <meta property="og:title" content="February, 2016" />
<meta property="og:description" content="2016-02-05 <meta property="og:description" content="2016-02-05
Looking at some DAGRIS data for Abenet Yabowork Looking at some DAGRIS data for Abenet Yabowork
Lots of issues with spaces, newlines, etc causing the import to fail Lots of issues with spaces, newlines, etc causing the import to fail
I noticed we have a very interesting list of countries on CGSpace: I noticed we have a very interesting list of countries on CGSpace:
Not only are there 49,000 countries, we have some blanks (25)&hellip; Not only are there 49,000 countries, we have some blanks (25)&hellip;
Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo; Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;
" /> " />
@ -29,19 +26,16 @@ Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&r
<meta name="twitter:title" content="February, 2016"/> <meta name="twitter:title" content="February, 2016"/>
<meta name="twitter:description" content="2016-02-05 <meta name="twitter:description" content="2016-02-05
Looking at some DAGRIS data for Abenet Yabowork Looking at some DAGRIS data for Abenet Yabowork
Lots of issues with spaces, newlines, etc causing the import to fail Lots of issues with spaces, newlines, etc causing the import to fail
I noticed we have a very interesting list of countries on CGSpace: I noticed we have a very interesting list of countries on CGSpace:
Not only are there 49,000 countries, we have some blanks (25)&hellip; Not only are there 49,000 countries, we have some blanks (25)&hellip;
Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo; Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -122,71 +116,53 @@ Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&r
</p> </p>
</header> </header>
<h2 id="2016-02-05">2016-02-05</h2> <h2 id="20160205">2016-02-05</h2>
<ul> <ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li> <li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> <li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> <li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p>
<ul> <ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
</ul> </ul>
<h2 id="20160206">2016-02-06</h2>
<h2 id="2016-02-06">2016-02-06</h2>
<ul> <ul>
<li>Found a way to get items with null/empty metadata values from SQL</li> <li>Found a way to get items with null/empty metadata values from SQL</li>
<li>First, find the <code>metadata_field_id</code> for the field you want from the <code>metadatafieldregistry</code> table:</li>
<li><p>First, find the <code>metadata_field_id</code> for the field you want from the <code>metadatafieldregistry</code> table:</p> </ul>
<pre><code>dspacetest=# select * from metadatafieldregistry; <pre><code>dspacetest=# select * from metadatafieldregistry;
</code></pre></li> </code></pre><ul>
<li>In this case our country field is 78</li>
<li><p>In this case our country field is 78</p></li> <li>Now find all resources with type 2 (item) that have null/empty values for that field:</li>
</ul>
<li><p>Now find all resources with type 2 (item) that have null/empty values for that field:</p>
<pre><code>dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL); <pre><code>dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL);
</code></pre></li> </code></pre><ul>
<li>Then you can find the handle that owns it from its <code>resource_id</code>:</li>
<li><p>Then you can find the handle that owns it from its <code>resource_id</code>:</p> </ul>
<pre><code>dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678'; <pre><code>dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678';
</code></pre></li> </code></pre><ul>
<li>It's 25 items so editing in the web UI is annoying, let's try SQL!</li>
<li><p>It&rsquo;s 25 items so editing in the web UI is annoying, let&rsquo;s try SQL!</p> </ul>
<pre><code>dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value=''; <pre><code>dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value='';
DELETE 25 DELETE 25
</code></pre></li> </code></pre><ul>
<li>After that perhaps a regular <code>dspace index-discovery</code> (no -b) <em>should</em> suffice&hellip;</li>
<li><p>After that perhaps a regular <code>dspace index-discovery</code> (no -b) <em>should</em> suffice&hellip;</p></li> <li>Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 &ldquo;|||&rdquo; countries are still there</li>
<li>Maybe I need to do a full re-index&hellip;</li>
<li><p>Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 &ldquo;|||&rdquo; countries are still there</p></li> <li>Yep! The full re-index seems to work.</li>
<li>Process the empty countries on CGSpace</li>
<li><p>Maybe I need to do a full re-index&hellip;</p></li>
<li><p>Yep! The full re-index seems to work.</p></li>
<li><p>Process the empty countries on CGSpace</p></li>
</ul> </ul>
<h2 id="20160207">2016-02-07</h2>
<h2 id="2016-02-07">2016-02-07</h2>
<ul> <ul>
<li>Working on cleaning up Abenet&rsquo;s DAGRIS data with OpenRefine</li> <li>Working on cleaning up Abenet's DAGRIS data with OpenRefine</li>
<li>I discovered two really nice functions in OpenRefine: <code>value.trim()</code> and <code>value.escape(&quot;javascript&quot;)</code> which shows whitespace characters like <code>\r\n</code>!</li> <li>I discovered two really nice functions in OpenRefine: <code>value.trim()</code> and <code>value.escape(&quot;javascript&quot;)</code> which shows whitespace characters like <code>\r\n</code>!</li>
<li>For some reason when you import an Excel file into OpenRefine it exports dates like 1949 to 1949.0 in the CSV</li> <li>For some reason when you import an Excel file into OpenRefine it exports dates like 1949 to 1949.0 in the CSV</li>
<li>I re-import the resulting CSV and run a GREL on the date issued column: <code>value.replace(&quot;\.0&quot;, &quot;&quot;)</code></li> <li>I re-import the resulting CSV and run a GREL on the date issued column: <code>value.replace(&quot;\.0&quot;, &quot;&quot;)</code></li>
<li>I need to start running DSpace in Mac OS X instead of a Linux VM</li> <li>I need to start running DSpace in Mac OS X instead of a Linux VM</li>
<li>Install PostgreSQL from homebrew, then configure and import CGSpace database dump:</li>
<li><p>Install PostgreSQL from homebrew, then configure and import CGSpace database dump:</p> </ul>
<pre><code>$ postgres -D /opt/brew/var/postgres <pre><code>$ postgres -D /opt/brew/var/postgres
$ createuser --superuser postgres $ createuser --superuser postgres
$ createuser --pwprompt dspacetest $ createuser --pwprompt dspacetest
@ -200,10 +176,9 @@ postgres=# alter user dspacetest nocreateuser;
postgres=# \q postgres=# \q
$ vacuumdb dspacetest $ vacuumdb dspacetest
$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost $ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
</code></pre></li> </code></pre><ul>
<li>After building and running a <code>fresh_install</code> I symlinked the webapps into Tomcat's webapps folder:</li>
<li><p>After building and running a <code>fresh_install</code> I symlinked the webapps into Tomcat&rsquo;s webapps folder:</p> </ul>
<pre><code>$ mv /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT.orig <pre><code>$ mv /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT.orig
$ ln -sfv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT $ ln -sfv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT
$ ln -sfv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/rest $ ln -sfv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/rest
@ -211,39 +186,28 @@ $ ln -sfv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/
$ ln -sfv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/oai $ ln -sfv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/oai
$ ln -sfv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/solr $ ln -sfv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/solr
$ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
</code></pre></li> </code></pre><ul>
<li>Add CATALINA_OPTS in <code>/opt/brew/Cellar/tomcat/8.0.30/libexec/bin/setenv.sh</code>, as this script is sourced by the <code>catalina</code> startup script</li>
<li><p>Add CATALINA_OPTS in <code>/opt/brew/Cellar/tomcat/8.0.30/libexec/bin/setenv.sh</code>, as this script is sourced by the <code>catalina</code> startup script</p></li> <li>For example:</li>
<li><p>For example:</p>
<pre><code>CATALINA_OPTS=&quot;-Djava.awt.headless=true -Xms2048m -Xmx2048m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8&quot;
</code></pre></li>
<li><p>After verifying that the site is working, start a full index:</p>
<pre><code>$ ~/dspace/bin/dspace index-discovery -b
</code></pre></li>
</ul> </ul>
<pre><code>CATALINA_OPTS=&quot;-Djava.awt.headless=true -Xms2048m -Xmx2048m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8&quot;
<h2 id="2016-02-08">2016-02-08</h2> </code></pre><ul>
<li>After verifying that the site is working, start a full index:</li>
</ul>
<pre><code>$ ~/dspace/bin/dspace index-discovery -b
</code></pre><h2 id="20160208">2016-02-08</h2>
<ul> <ul>
<li>Finish cleaning up and importing ~400 DAGRIS items into CGSpace</li> <li>Finish cleaning up and importing ~400 DAGRIS items into CGSpace</li>
<li>Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme&rsquo;s brand colors (<a href="https://github.com/ilri/DSpace/issues/154">#154</a>)</li> <li>Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme's brand colors (<a href="https://github.com/ilri/DSpace/issues/154">#154</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/submit-button-ilri.png" alt="ILRI submission buttons">
<p><img src="/cgspace-notes/2016/02/submit-button-ilri.png" alt="ILRI submission buttons" /> <img src="/cgspace-notes/2016/02/submit-button-drylands.png" alt="Drylands submission buttons"></p>
<img src="/cgspace-notes/2016/02/submit-button-drylands.png" alt="Drylands submission buttons" /></p> <h2 id="20160209">2016-02-09</h2>
<h2 id="2016-02-09">2016-02-09</h2>
<ul> <ul>
<li>Re-sync DSpace Test with CGSpace</li> <li>Re-sync DSpace Test with CGSpace</li>
<li>Help Sisay with OpenRefine</li> <li>Help Sisay with OpenRefine</li>
<li>Enable HTTPS on DSpace Test using Let's Encrypt:</li>
<li><p>Enable HTTPS on DSpace Test using Let&rsquo;s Encrypt:</p> </ul>
<pre><code>$ cd ~/src/git <pre><code>$ cd ~/src/git
$ git clone https://github.com/letsencrypt/letsencrypt $ git clone https://github.com/letsencrypt/letsencrypt
$ cd letsencrypt $ cd letsencrypt
@ -252,51 +216,39 @@ $ sudo service nginx stop
$ ./letsencrypt-auto certonly --standalone -d dspacetest.cgiar.org $ ./letsencrypt-auto certonly --standalone -d dspacetest.cgiar.org
$ sudo service nginx start $ sudo service nginx start
$ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-become-pass $ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-become-pass
</code></pre></li> </code></pre><ul>
<li>We should install it in /opt/letsencrypt and then script the renewal script, but first we have to wire up some variables and template stuff based on the script here: <a href="https://letsencrypt.org/howitworks/">https://letsencrypt.org/howitworks/</a></li>
<li><p>We should install it in /opt/letsencrypt and then script the renewal script, but first we have to wire up some variables and template stuff based on the script here: <a href="https://letsencrypt.org/howitworks/">https://letsencrypt.org/howitworks/</a></p></li> <li>I had to export some CIAT items that were being cleaned up on the test server and I noticed their <code>dc.contributor.author</code> fields have DSpace 5 authority index UUIDs&hellip;</li>
<li>To clean those up in OpenRefine I used this GREL expression: <code>value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,&quot;&quot;)</code></li>
<li><p>I had to export some CIAT items that were being cleaned up on the test server and I noticed their <code>dc.contributor.author</code> fields have DSpace 5 authority index UUIDs&hellip;</p></li> <li>Getting more and more hangs on DSpace Test, seemingly random but also during CSV import</li>
<li>Logs don't always show anything right when it fails, but eventually one of these appears:</li>
<li><p>To clean those up in OpenRefine I used this GREL expression: <code>value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,&quot;&quot;)</code></p></li> </ul>
<li><p>Getting more and more hangs on DSpace Test, seemingly random but also during CSV import</p></li>
<li><p>Logs don&rsquo;t always show anything right when it fails, but eventually one of these appears:</p>
<pre><code>org.dspace.discovery.SearchServiceException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space <pre><code>org.dspace.discovery.SearchServiceException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>or</li>
<li><p>or</p> </ul>
<pre><code>Caused by: java.util.NoSuchElementException: Timeout waiting for idle object <pre><code>Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
</code></pre></li> </code></pre><ul>
<li>Right now DSpace Test's Tomcat heap is set to 1536m and we have quite a bit of free RAM:</li>
<li><p>Right now DSpace Test&rsquo;s Tomcat heap is set to 1536m and we have quite a bit of free RAM:</p> </ul>
<pre><code># free -m <pre><code># free -m
total used free shared buffers cached total used free shared buffers cached
Mem: 3950 3902 48 9 37 1311 Mem: 3950 3902 48 9 37 1311
-/+ buffers/cache: 2552 1397 -/+ buffers/cache: 2552 1397
Swap: 255 57 198 Swap: 255 57 198
</code></pre></li> </code></pre><ul>
<li>So I'll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</li>
<li><p>So I&rsquo;ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</p></li>
</ul> </ul>
<h2 id="20160211">2016-02-11</h2>
<h2 id="2016-02-11">2016-02-11</h2>
<ul> <ul>
<li>Massaging some CIAT data in OpenRefine</li> <li>Massaging some CIAT data in OpenRefine</li>
<li>There are 1200 records that have PDFs, and will need to be imported into CGSpace</li> <li>There are 1200 records that have PDFs, and will need to be imported into CGSpace</li>
<li>I created a <code>filename</code> column based on the <code>dc.identifier.url</code> column using the following transform:</li>
<li><p>I created a <code>filename</code> column based on the <code>dc.identifier.url</code> column using the following transform:</p> </ul>
<pre><code>value.split('/')[-1] <pre><code>value.split('/')[-1]
</code></pre></li> </code></pre><ul>
<li>Then I wrote a tool called <a href="https://gist.github.com/alanorth/2206f24483fe5f0454fc"><code>generate-thumbnails.py</code></a> to download the PDFs and generate thumbnails for them, for example:</li>
<li><p>Then I wrote a tool called <a href="https://gist.github.com/alanorth/2206f24483fe5f0454fc"><code>generate-thumbnails.py</code></a> to download the PDFs and generate thumbnails for them, for example:</p> </ul>
<pre><code>$ ./generate-thumbnails.py ciat-reports.csv <pre><code>$ ./generate-thumbnails.py ciat-reports.csv
Processing 64661.pdf Processing 64661.pdf
&gt; Downloading 64661.pdf &gt; Downloading 64661.pdf
@ -304,138 +256,99 @@ Processing 64661.pdf
Processing 64195.pdf Processing 64195.pdf
&gt; Downloading 64195.pdf &gt; Downloading 64195.pdf
&gt; Creating thumbnail for 64195.pdf &gt; Creating thumbnail for 64195.pdf
</code></pre></li> </code></pre><h2 id="20160212">2016-02-12</h2>
</ul>
<h2 id="2016-02-12">2016-02-12</h2>
<ul> <ul>
<li>Looking at CIAT&rsquo;s records again, there are some problems with a dozen or so files (out of 1200)</li> <li>Looking at CIAT's records again, there are some problems with a dozen or so files (out of 1200)</li>
<li>A few items are using the same exact PDF</li> <li>A few items are using the same exact PDF</li>
<li>A few items are using HTM or DOC files</li> <li>A few items are using HTM or DOC files</li>
<li>A few items link to PDFs on IFPRI&rsquo;s e-Library or Research Gate</li> <li>A few items link to PDFs on IFPRI's e-Library or Research Gate</li>
<li>A few items have no item</li> <li>A few items have no item</li>
<li>Also, I&rsquo;m not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li> <li>Also, I'm not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li>
</ul> </ul>
<h2 id="201602121">2016-02-12</h2>
<h2 id="2016-02-12-1">2016-02-12</h2>
<ul> <ul>
<li>Looking at CIAT&rsquo;s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I&rsquo;m not sure if we can use those</li> <li>Looking at CIAT's records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I'm not sure if we can use those</li>
<li>265 items have dirty, URL-encoded filenames:</li>
<li><p>265 items have dirty, URL-encoded filenames:</p> </ul>
<pre><code>$ ls | grep -c -E &quot;%&quot; <pre><code>$ ls | grep -c -E &quot;%&quot;
265 265
</code></pre></li> </code></pre><ul>
<li>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</li>
<li><p>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</p></li> <li>This python2 snippet seems to work in the CLI, but not so well in OpenRefine:</li>
</ul>
<li><p>This python2 snippet seems to work in the CLI, but not so well in OpenRefine:</p>
<pre><code>$ python -c &quot;import urllib, sys; print urllib.unquote(sys.argv[1])&quot; CIAT_COLOMBIA_000169_T%C3%A9cnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf <pre><code>$ python -c &quot;import urllib, sys; print urllib.unquote(sys.argv[1])&quot; CIAT_COLOMBIA_000169_T%C3%A9cnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf
CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf
</code></pre></li> </code></pre><ul>
<li>Merge pull requests for submission form theming (<a href="https://github.com/ilri/DSpace/pull/178">#178</a>) and missing center subjects in XMLUI item views (<a href="https://github.com/ilri/DSpace/pull/176">#176</a>)</li>
<li><p>Merge pull requests for submission form theming (<a href="https://github.com/ilri/DSpace/pull/178">#178</a>) and missing center subjects in XMLUI item views (<a href="https://github.com/ilri/DSpace/pull/176">#176</a>)</p></li> <li>They will be deployed on CGSpace the next time I re-deploy</li>
<li><p>They will be deployed on CGSpace the next time I re-deploy</p></li>
</ul> </ul>
<h2 id="20160216">2016-02-16</h2>
<h2 id="2016-02-16">2016-02-16</h2>
<ul> <ul>
<li><p>Turns out OpenRefine has an unescape function!</p> <li>Turns out OpenRefine has an unescape function!</li>
<pre><code>value.unescape(&quot;url&quot;)
</code></pre></li>
<li><p>This turns the URLs into human-readable versions that we can use as proper filenames</p></li>
<li><p>Run web server and system updates on DSpace Test and reboot</p></li>
<li><p>To merge <code>dc.identifier.url</code> and <code>dc.identifier.url[]</code>, rename the second column so it doesn&rsquo;t have the brackets, like <code>dc.identifier.url2</code></p></li>
<li><p>Then you create a facet for blank values on each column, show the rows that have values for one and not the other, then transform each independently to have the contents of the other, with &ldquo;||&rdquo; in between</p></li>
<li><p>Work on Python script for parsing and downloading PDF records from <code>dc.identifier.url</code></p></li>
<li><p>To get filenames from <code>dc.identifier.url</code>, create a new column based on this transform: <code>forEach(value.split('||'), v, v.split('/')[-1]).join('||')</code></p></li>
<li><p>This also works for records that have multiple URLs (separated by &ldquo;||&rdquo;)</p></li>
</ul> </ul>
<pre><code>value.unescape(&quot;url&quot;)
<h2 id="2016-02-17">2016-02-17</h2> </code></pre><ul>
<li>This turns the URLs into human-readable versions that we can use as proper filenames</li>
<li>Run web server and system updates on DSpace Test and reboot</li>
<li>To merge <code>dc.identifier.url</code> and <code>dc.identifier.url[]</code>, rename the second column so it doesn't have the brackets, like <code>dc.identifier.url2</code></li>
<li>Then you create a facet for blank values on each column, show the rows that have values for one and not the other, then transform each independently to have the contents of the other, with &ldquo;||&rdquo; in between</li>
<li>Work on Python script for parsing and downloading PDF records from <code>dc.identifier.url</code></li>
<li>To get filenames from <code>dc.identifier.url</code>, create a new column based on this transform: <code>forEach(value.split('||'), v, v.split('/')[-1]).join('||')</code></li>
<li>This also works for records that have multiple URLs (separated by &ldquo;||&rdquo;)</li>
</ul>
<h2 id="20160217">2016-02-17</h2>
<ul> <ul>
<li>Re-deploy CGSpace, run all system updates, and reboot</li> <li>Re-deploy CGSpace, run all system updates, and reboot</li>
<li>More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test</li> <li>More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test</li>
<li>SAFBuilder has a bug preventing it from processing filenames containing more than one underscore</li> <li>SAFBuilder has a bug preventing it from processing filenames containing more than one underscore</li>
<li>Need to re-process the filename column to replace multiple underscores with one: <code>value.replace(/_{2,}/, &quot;_&quot;)</code></li> <li>Need to re-process the filename column to replace multiple underscores with one: <code>value.replace(/_{2,}/, &quot;_&quot;)</code></li>
</ul> </ul>
<h2 id="20160220">2016-02-20</h2>
<h2 id="2016-02-20">2016-02-20</h2>
<ul> <ul>
<li>Turns out the &ldquo;bug&rdquo; in SAFBuilder isn&rsquo;t a bug, it&rsquo;s a feature that allows you to encode extra information like the destintion bundle in the filename</li> <li>Turns out the &ldquo;bug&rdquo; in SAFBuilder isn't a bug, it's a feature that allows you to encode extra information like the destintion bundle in the filename</li>
<li>Also, it seems DSpace's SAF import tool doesn't like importing filenames that have accents in them:</li>
<li><p>Also, it seems DSpace&rsquo;s SAF import tool doesn&rsquo;t like importing filenames that have accents in them:</p>
<pre><code>java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/CIAT_COLOMBIA_000075_Medición_de_palatabilidad_en_forrajes.pdf (No such file or directory)
</code></pre></li>
<li><p>Need to rename files to have no accents or umlauts, etc&hellip;</p></li>
<li><p>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></p></li>
</ul> </ul>
<pre><code>java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/CIAT_COLOMBIA_000075_Medición_de_palatabilidad_en_forrajes.pdf (No such file or directory)
<h2 id="2016-02-22">2016-02-22</h2> </code></pre><ul>
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
</ul>
<h2 id="20160222">2016-02-22</h2>
<ul> <ul>
<li><p>To change Spanish accents to ASCII in OpenRefine:</p> <li>To change Spanish accents to ASCII in OpenRefine:</li>
</ul>
<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n') <pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
</code></pre></li> </code></pre><ul>
<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li>
<li><p>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</p></li> <li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li>
</ul>
<li><p>On closer inspection, I can import files with the following names on Linux (DSpace Test):</p>
<pre><code>Bitstream: tést.pdf <pre><code>Bitstream: tést.pdf
Bitstream: tést señora.pdf Bitstream: tést señora.pdf
Bitstream: tést señora alimentación.pdf Bitstream: tést señora alimentación.pdf
</code></pre></li> </code></pre><ul>
<li>Seems it could be something with the HFS+ filesystem actually, as it's not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it's something like UCS-2</a>)</li>
<li><p>Seems it could be something with the HFS+ filesystem actually, as it&rsquo;s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it&rsquo;s something like UCS-2</a>)</p></li> <li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux's ext4 stores them as an array of bytes</li>
<li>Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches</li>
<li><p>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux&rsquo;s ext4 stores them as an array of bytes</p></li>
<li><p>Running the SAFBuilder on Mac OS X works if you&rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&rsquo;s encoding matches</p></li>
</ul> </ul>
<h2 id="20160229">2016-02-29</h2>
<h2 id="2016-02-29">2016-02-29</h2>
<ul> <ul>
<li>Got notified by some CIFOR colleagues that the Google Scholar team had contacted them about CGSpace&rsquo;s incorrect ordering of authors in Google Scholar metadata</li> <li>Got notified by some CIFOR colleagues that the Google Scholar team had contacted them about CGSpace's incorrect ordering of authors in Google Scholar metadata</li>
<li>Turns out there is a patch, and it was merged in DSpace 5.4: <a href="https://jira.duraspace.org/browse/DS-2679">https://jira.duraspace.org/browse/DS-2679</a></li> <li>Turns out there is a patch, and it was merged in DSpace 5.4: <a href="https://jira.duraspace.org/browse/DS-2679">https://jira.duraspace.org/browse/DS-2679</a></li>
<li>I&rsquo;ve merged it into our <code>5_x-prod</code> branch that is currently based on DSpace 5.1</li> <li>I've merged it into our <code>5_x-prod</code> branch that is currently based on DSpace 5.1</li>
<li>We found a bug when a user searches from the homepage, sorts the results, and then tries to click &ldquo;View More&rdquo; in a sidebar facet</li> <li>We found a bug when a user searches from the homepage, sorts the results, and then tries to click &ldquo;View More&rdquo; in a sidebar facet</li>
<li>I am not sure what causes it yet, but I opened an issue for it: <a href="https://github.com/ilri/DSpace/issues/179">https://github.com/ilri/DSpace/issues/179</a></li> <li>I am not sure what causes it yet, but I opened an issue for it: <a href="https://github.com/ilri/DSpace/issues/179">https://github.com/ilri/DSpace/issues/179</a></li>
<li>Have more problems with SAFBuilder on Mac OS X</li> <li>Have more problems with SAFBuilder on Mac OS X</li>
<li>Now it doesn&rsquo;t recognize description hints in the filename column, like: <code>test.pdf__description:Blah</code></li> <li>Now it doesn't recognize description hints in the filename column, like: <code>test.pdf__description:Blah</code></li>
<li>But on Linux it works fine</li> <li>But on Linux it works fine</li>
<li>Trying to test Atmire&rsquo;s series of stats and CUA fixes from January and February, but their branch history is really messy and it&rsquo;s hard to see what&rsquo;s going on</li> <li>Trying to test Atmire's series of stats and CUA fixes from January and February, but their branch history is really messy and it's hard to see what's going on</li>
<li>Rebasing their branch on top of our production branch results in a broken Tomcat, so I&rsquo;m going to tell them to fix their history and make a proper pull request</li> <li>Rebasing their branch on top of our production branch results in a broken Tomcat, so I'm going to tell them to fix their history and make a proper pull request</li>
<li>Looking at the filenames for the CIAT Reports, some have some really ugly characters, like: <code>'</code> or <code>,</code> or <code>=</code> or <code>[</code> or <code>]</code> or <code>(</code> or <code>)</code> or <code>_.pdf</code> or <code>._</code> etc</li> <li>Looking at the filenames for the CIAT Reports, some have some really ugly characters, like: <code>'</code> or <code>,</code> or <code>=</code> or <code>[</code> or <code>]</code> or <code>(</code> or <code>)</code> or <code>_.pdf</code> or <code>._</code> etc</li>
<li>It's tricky to parse those things in some programming languages so I'd rather just get rid of the weird stuff now in OpenRefine:</li>
<li><p>It&rsquo;s tricky to parse those things in some programming languages so I&rsquo;d rather just get rid of the weird stuff now in OpenRefine:</p> </ul>
<pre><code>value.replace(&quot;'&quot;,'').replace('_=_','_').replace(',','').replace('[','').replace(']','').replace('(','').replace(')','').replace('_.pdf','.pdf').replace('._','_') <pre><code>value.replace(&quot;'&quot;,'').replace('_=_','_').replace(',','').replace('[','').replace(']','').replace('(','').replace(')','').replace('_.pdf','.pdf').replace('._','_')
</code></pre></li> </code></pre><ul>
<li>Finally import the 1127 CIAT items into CGSpace: <a href="https://cgspace.cgiar.org/handle/10568/35710">https://cgspace.cgiar.org/handle/10568/35710</a></li>
<li><p>Finally import the 1127 CIAT items into CGSpace: <a href="https://cgspace.cgiar.org/handle/10568/35710">https://cgspace.cgiar.org/handle/10568/35710</a></p></li> <li>Re-deploy CGSpace with the Google Scholar fix, but I'm waiting on the Atmire fixes for now, as the branch history is ugly</li>
<li><p>Re-deploy CGSpace with the Google Scholar fix, but I&rsquo;m waiting on the Atmire fixes for now, as the branch history is ugly</p></li>
</ul> </ul>

View File

@ -8,9 +8,8 @@
<meta property="og:title" content="March, 2016" /> <meta property="og:title" content="March, 2016" />
<meta property="og:description" content="2016-03-02 <meta property="og:description" content="2016-03-02
Looking at issues with author authorities on CGSpace Looking at issues with author authorities on CGSpace
For some reason we still have the index-lucene-update cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module For some reason we still have the index-lucene-update cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -22,12 +21,11 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<meta name="twitter:title" content="March, 2016"/> <meta name="twitter:title" content="March, 2016"/>
<meta name="twitter:description" content="2016-03-02 <meta name="twitter:description" content="2016-03-02
Looking at issues with author authorities on CGSpace Looking at issues with author authorities on CGSpace
For some reason we still have the index-lucene-update cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module For some reason we still have the index-lucene-update cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -108,112 +106,86 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
</p> </p>
</header> </header>
<h2 id="2016-03-02">2016-03-02</h2> <h2 id="20160302">2016-03-02</h2>
<ul> <ul>
<li>Looking at issues with author authorities on CGSpace</li> <li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul> </ul>
<h2 id="20160307">2016-03-07</h2>
<h2 id="2016-03-07">2016-03-07</h2>
<ul> <ul>
<li>Troubleshooting the issues with the slew of commits for Atmire modules in <a href="https://github.com/ilri/DSpace/pull/182">#182</a></li> <li>Troubleshooting the issues with the slew of commits for Atmire modules in <a href="https://github.com/ilri/DSpace/pull/182">#182</a></li>
<li>Their changes on <code>5_x-dev</code> branch work, but it is messy as hell with merge commits and old branch base</li> <li>Their changes on <code>5_x-dev</code> branch work, but it is messy as hell with merge commits and old branch base</li>
<li>When I rebase their branch on the latest <code>5_x-prod</code> I get blank white pages</li> <li>When I rebase their branch on the latest <code>5_x-prod</code> I get blank white pages</li>
<li>I identified one commit that causes the issue and let them know</li> <li>I identified one commit that causes the issue and let them know</li>
<li>Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:</li>
<li><p>Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:</p>
<pre><code>Exception in thread &quot;Lucene Merge Thread #19&quot; org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
</code></pre></li>
</ul> </ul>
<pre><code>Exception in thread &quot;Lucene Merge Thread #19&quot; org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
<h2 id="2016-03-08">2016-03-08</h2> </code></pre><h2 id="20160308">2016-03-08</h2>
<ul> <ul>
<li>Add a few new filters to Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/issues/180">#180</a>)</li> <li>Add a few new filters to Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/issues/180">#180</a>)</li>
<li>We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were</li> <li>We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were</li>
</ul> </ul>
<h2 id="20160310">2016-03-10</h2>
<h2 id="2016-03-10">2016-03-10</h2>
<ul> <ul>
<li>Disable the lucene cron job on CGSpace as it shouldn&rsquo;t be needed anymore</li> <li>Disable the lucene cron job on CGSpace as it shouldn't be needed anymore</li>
<li>Discuss ORCiD and duplicate authors on Yammer</li> <li>Discuss ORCiD and duplicate authors on Yammer</li>
<li>Request new documentation for Atmire CUA and L&amp;R modules, as ours are from 2013</li> <li>Request new documentation for Atmire CUA and L&amp;R modules, as ours are from 2013</li>
<li>Walk Sisay through some data cleaning workflows in OpenRefine</li> <li>Walk Sisay through some data cleaning workflows in OpenRefine</li>
<li>Start cleaning up the configuration for Atmire&rsquo;s CUA module (<a href="https://github.com/ilri/DSpace/issues/185">#184</a>)</li> <li>Start cleaning up the configuration for Atmire's CUA module (<a href="https://github.com/ilri/DSpace/issues/185">#184</a>)</li>
<li>It is very messed up because some labels are incorrect, fields are missing, etc</li> <li>It is very messed up because some labels are incorrect, fields are missing, etc</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/03/cua-label-mixup.png" alt="Mixed up label in Atmire CUA"></p>
<p><img src="/cgspace-notes/2016/03/cua-label-mixup.png" alt="Mixed up label in Atmire CUA" /></p>
<ul> <ul>
<li>Update documentation for Atmire modules</li> <li>Update documentation for Atmire modules</li>
</ul> </ul>
<h2 id="20160311">2016-03-11</h2>
<h2 id="2016-03-11">2016-03-11</h2>
<ul> <ul>
<li>As I was looking at the CUA config I realized our Discovery config is all messed up and confusing</li> <li>As I was looking at the CUA config I realized our Discovery config is all messed up and confusing</li>
<li>I&rsquo;ve opened an issue to track some of that work (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li> <li>I've opened an issue to track some of that work (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li>
<li>I did some major cleanup work on Discovery and XMLUI stuff related to the <code>dc.type</code> indexes (<a href="https://github.com/ilri/DSpace/pull/187">#187</a>)</li> <li>I did some major cleanup work on Discovery and XMLUI stuff related to the <code>dc.type</code> indexes (<a href="https://github.com/ilri/DSpace/pull/187">#187</a>)</li>
<li>We had been confusing <code>dc.type</code> (a Dublin Core value) with <code>dc.type.output</code> (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.</li> <li>We had been confusing <code>dc.type</code> (a Dublin Core value) with <code>dc.type.output</code> (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.</li>
<li>There is still some more work to be done to remove references to old <code>outputtype</code> and <code>output</code></li> <li>There is still some more work to be done to remove references to old <code>outputtype</code> and <code>output</code></li>
</ul> </ul>
<h2 id="20160314">2016-03-14</h2>
<h2 id="2016-03-14">2016-03-14</h2>
<ul> <ul>
<li>Fix some items that had invalid dates (I noticed them in the log during a re-indexing)</li> <li>Fix some items that had invalid dates (I noticed them in the log during a re-indexing)</li>
<li>Reset <code>search.index.*</code> to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): <a href="https://github.com/ilri/DSpace/pull/188">#188</a></li> <li>Reset <code>search.index.*</code> to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): <a href="https://github.com/ilri/DSpace/pull/188">#188</a></li>
<li>Make titles in Discovery and Browse by more consistent (singular, sentence case, etc) (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li> <li>Make titles in Discovery and Browse by more consistent (singular, sentence case, etc) (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li>
<li>Also four or so center-specific subject strings were missing for Discovery</li> <li>Also four or so center-specific subject strings were missing for Discovery</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/03/missing-xmlui-string.png" alt="Missing XMLUI string"></p>
<p><img src="/cgspace-notes/2016/03/missing-xmlui-string.png" alt="Missing XMLUI string" /></p> <h2 id="20160315">2016-03-15</h2>
<h2 id="2016-03-15">2016-03-15</h2>
<ul> <ul>
<li>Create simple theme for new AVCD community just for a unique Google Tracking ID (<a href="https://github.com/ilri/DSpace/pull/191">#191</a>)</li> <li>Create simple theme for new AVCD community just for a unique Google Tracking ID (<a href="https://github.com/ilri/DSpace/pull/191">#191</a>)</li>
</ul> </ul>
<h2 id="20160316">2016-03-16</h2>
<h2 id="2016-03-16">2016-03-16</h2>
<ul> <ul>
<li>Still having problems deploying Atmire&rsquo;s CUA updates and fixes from January!</li> <li>Still having problems deploying Atmire's CUA updates and fixes from January!</li>
<li>More discussion on the GitHub issue here: <a href="https://github.com/ilri/DSpace/pull/182">https://github.com/ilri/DSpace/pull/182</a></li> <li>More discussion on the GitHub issue here: <a href="https://github.com/ilri/DSpace/pull/182">https://github.com/ilri/DSpace/pull/182</a></li>
<li>Clean up Atmire CUA config (<a href="https://github.com/ilri/DSpace/pull/193">#193</a>)</li> <li>Clean up Atmire CUA config (<a href="https://github.com/ilri/DSpace/pull/193">#193</a>)</li>
<li>Help Sisay with some PostgreSQL queries to clean up the incorrect <code>dc.contributor.corporateauthor</code> field</li> <li>Help Sisay with some PostgreSQL queries to clean up the incorrect <code>dc.contributor.corporateauthor</code> field</li>
<li>I noticed that we have some weird values in <code>dc.language</code>:</li>
<li><p>I noticed that we have some weird values in <code>dc.language</code>:</p>
<pre><code># select * from metadatavalue where metadata_field_id=37;
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
-------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
1942571 | 35342 | 37 | hi | | 1 | | -1 | 2
1942468 | 35345 | 37 | hi | | 1 | | -1 | 2
1942479 | 35337 | 37 | hi | | 1 | | -1 | 2
1942505 | 35336 | 37 | hi | | 1 | | -1 | 2
1942519 | 35338 | 37 | hi | | 1 | | -1 | 2
1942535 | 35340 | 37 | hi | | 1 | | -1 | 2
1942555 | 35341 | 37 | hi | | 1 | | -1 | 2
1942588 | 35343 | 37 | hi | | 1 | | -1 | 2
1942610 | 35346 | 37 | hi | | 1 | | -1 | 2
1942624 | 35347 | 37 | hi | | 1 | | -1 | 2
1942639 | 35339 | 37 | hi | | 1 | | -1 | 2
</code></pre></li>
<li><p>It seems this <code>dc.language</code> field isn&rsquo;t really used, but we should delete these values</p></li>
<li><p>Also, <code>dc.language.iso</code> has some weird values, like &ldquo;En&rdquo; and &ldquo;English&rdquo;</p></li>
</ul> </ul>
<pre><code># select * from metadatavalue where metadata_field_id=37;
<h2 id="2016-03-17">2016-03-17</h2> metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
-------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
1942571 | 35342 | 37 | hi | | 1 | | -1 | 2
1942468 | 35345 | 37 | hi | | 1 | | -1 | 2
1942479 | 35337 | 37 | hi | | 1 | | -1 | 2
1942505 | 35336 | 37 | hi | | 1 | | -1 | 2
1942519 | 35338 | 37 | hi | | 1 | | -1 | 2
1942535 | 35340 | 37 | hi | | 1 | | -1 | 2
1942555 | 35341 | 37 | hi | | 1 | | -1 | 2
1942588 | 35343 | 37 | hi | | 1 | | -1 | 2
1942610 | 35346 | 37 | hi | | 1 | | -1 | 2
1942624 | 35347 | 37 | hi | | 1 | | -1 | 2
1942639 | 35339 | 37 | hi | | 1 | | -1 | 2
</code></pre><ul>
<li>It seems this <code>dc.language</code> field isn't really used, but we should delete these values</li>
<li>Also, <code>dc.language.iso</code> has some weird values, like &ldquo;En&rdquo; and &ldquo;English&rdquo;</li>
</ul>
<h2 id="20160317">2016-03-17</h2>
<ul> <ul>
<li>It turns out <code>hi</code> is the ISO 639 language code for Hindi, but these should be in <code>dc.language.iso</code> instead of <code>dc.language</code></li> <li>It turns out <code>hi</code> is the ISO 639 language code for Hindi, but these should be in <code>dc.language.iso</code> instead of <code>dc.language</code></li>
<li>I fixed the eleven items with <code>hi</code> as well as some using the incorrect <code>vn</code> for Vietnamese</li> <li>I fixed the eleven items with <code>hi</code> as well as some using the incorrect <code>vn</code> for Vietnamese</li>
@ -221,108 +193,83 @@ metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | p
<li>Re-sync CGSpace database to DSpace Test for Atmire to do some tests about the problematic CUA patches</li> <li>Re-sync CGSpace database to DSpace Test for Atmire to do some tests about the problematic CUA patches</li>
<li>The patches work fine with a clean database, so the error was caused by some mismatch in CUA versions and the database during my testing</li> <li>The patches work fine with a clean database, so the error was caused by some mismatch in CUA versions and the database during my testing</li>
</ul> </ul>
<h2 id="20160318">2016-03-18</h2>
<h2 id="2016-03-18">2016-03-18</h2>
<ul> <ul>
<li>Merge Atmire fixes into <code>5_x-prod</code></li> <li>Merge Atmire fixes into <code>5_x-prod</code></li>
<li>Discuss thumbnails with Francesca from Bioversity</li> <li>Discuss thumbnails with Francesca from Bioversity</li>
<li>Some of their items end up with thumbnails that have a big white border around them:</li> <li>Some of their items end up with thumbnails that have a big white border around them:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/03/bioversity-thumbnail-bad.jpg" alt="Excessive whitespace in thumbnail"></p>
<p><img src="/cgspace-notes/2016/03/bioversity-thumbnail-bad.jpg" alt="Excessive whitespace in thumbnail" /></p>
<ul> <ul>
<li>Turns out we can add <code>-trim</code> to the GraphicsMagick options to trim the whitespace</li> <li>Turns out we can add <code>-trim</code> to the GraphicsMagick options to trim the whitespace</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/03/bioversity-thumbnail-good.jpg" alt="Trimmed thumbnail"></p>
<p><img src="/cgspace-notes/2016/03/bioversity-thumbnail-good.jpg" alt="Trimmed thumbnail" /></p>
<ul> <ul>
<li><p>Command used:</p> <li>Command used:</li>
<pre><code>$ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_EN-2015_2021.pdf\[0\] cover.jpg
</code></pre></li>
<li><p>Also, it looks like adding <code>-sharpen 0x1.0</code> really improves the quality of the image for only a few KB</p></li>
</ul> </ul>
<pre><code>$ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_EN-2015_2021.pdf\[0\] cover.jpg
<h2 id="2016-03-21">2016-03-21</h2> </code></pre><ul>
<li>Also, it looks like adding <code>-sharpen 0x1.0</code> really improves the quality of the image for only a few KB</li>
</ul>
<h2 id="20160321">2016-03-21</h2>
<ul> <ul>
<li>Fix 66 site errors in Google&rsquo;s webmaster tools</li> <li>Fix 66 site errors in Google's webmaster tools</li>
<li>I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed</li> <li>I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed</li>
<li>We also have 1,300 &ldquo;soft 404&rdquo; errors for URLs like: <a href="https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity">https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity</a></li> <li>We also have 1,300 &ldquo;soft 404&rdquo; errors for URLs like: <a href="https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity">https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity</a></li>
<li>I&rsquo;ve marked them as fixed as well since the ones I tested were working fine</li> <li>I've marked them as fixed as well since the ones I tested were working fine</li>
<li>This raises another question, as many of these pages are linked from Discovery search results and might create a duplicate content problem&hellip;</li> <li>This raises another question, as many of these pages are linked from Discovery search results and might create a duplicate content problem&hellip;</li>
<li>Results pages like this give items that Google already knows from the sitemap: <a href="https://cgspace.cgiar.org/discover?filtertype=author&amp;filter_relational_operator=equals&amp;filter=Orth%2C+A">https://cgspace.cgiar.org/discover?filtertype=author&amp;filter_relational_operator=equals&amp;filter=Orth%2C+A</a>.</li> <li>Results pages like this give items that Google already knows from the sitemap: <a href="https://cgspace.cgiar.org/discover?filtertype=author&amp;filter_relational_operator=equals&amp;filter=Orth%2C+A">https://cgspace.cgiar.org/discover?filtertype=author&amp;filter_relational_operator=equals&amp;filter=Orth%2C+A</a>.</li>
<li>There are some access denied errors on JSPUI links (of course! we forbid them!), but I&rsquo;m not sure why Google is trying to index them&hellip;</li> <li>There are some access denied errors on JSPUI links (of course! we forbid them!), but I'm not sure why Google is trying to index them&hellip;</li>
<li>For example: <li>For example:
<ul> <ul>
<li>This: <a href="https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf">https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf</a></li> <li>This: <a href="https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf">https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf</a></li>
<li>Linked from: <a href="https://cgspace.cgiar.org/jspui/handle/10568/809">https://cgspace.cgiar.org/jspui/handle/10568/809</a></li> <li>Linked from: <a href="https://cgspace.cgiar.org/jspui/handle/10568/809">https://cgspace.cgiar.org/jspui/handle/10568/809</a></li>
</ul></li> </ul>
</li>
<li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li> <li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li>
<li>Google says the first time it saw this particular error was September 29, 2015&hellip; so maybe it accidentally saw it somehow&hellip;</li> <li>Google says the first time it saw this particular error was September 29, 2015&hellip; so maybe it accidentally saw it somehow&hellip;</li>
<li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li> <li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/03/google-index.png" alt="CGSpace pages in Google index"></p>
<p><img src="/cgspace-notes/2016/03/google-index.png" alt="CGSpace pages in Google index" /></p>
<ul> <ul>
<li>Turns out this is a problem with DSpace&rsquo;s <code>robots.txt</code>, and there&rsquo;s a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Turns out this is a problem with DSpace's <code>robots.txt</code>, and there's a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>I am not sure if I want to apply it yet</li> <li>I am not sure if I want to apply it yet</li>
<li>For now I&rsquo;ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li> <li>For now I've just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/03/url-parameters.png" alt="URL parameters cause millions of dynamic pages">
<p><img src="/cgspace-notes/2016/03/url-parameters.png" alt="URL parameters cause millions of dynamic pages" /> <img src="/cgspace-notes/2016/03/url-parameters2.png" alt="Setting pages with the filter_0 param not to show in search results"></p>
<img src="/cgspace-notes/2016/03/url-parameters2.png" alt="Setting pages with the filter_0 param not to show in search results" /></p>
<ul> <ul>
<li>Move AVCD collection to new community and update <code>move_collection.sh</code> script: <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">https://gist.github.com/alanorth/392c4660e8b022d99dfa</a></li> <li>Move AVCD collection to new community and update <code>move_collection.sh</code> script: <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">https://gist.github.com/alanorth/392c4660e8b022d99dfa</a></li>
<li>It seems Feedburner can do HTTPS now, so we might be able to update our feeds and simplify the nginx configs</li> <li>It seems Feedburner can do HTTPS now, so we might be able to update our feeds and simplify the nginx configs</li>
<li>De-deploy CGSpace with latest <code>5_x-prod</code> branch</li> <li>De-deploy CGSpace with latest <code>5_x-prod</code> branch</li>
<li>Run updates on CGSpace and reboot server (new kernel, <code>4.5.0</code>)</li> <li>Run updates on CGSpace and reboot server (new kernel, <code>4.5.0</code>)</li>
<li>Deploy Let&rsquo;s Encrypt certificate for cgspace.cgiar.org, but still need to work it into the ansible playbooks</li> <li>Deploy Let's Encrypt certificate for cgspace.cgiar.org, but still need to work it into the ansible playbooks</li>
</ul> </ul>
<h2 id="20160322">2016-03-22</h2>
<h2 id="2016-03-22">2016-03-22</h2>
<ul> <ul>
<li>Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly (<a href="https://github.com/ilri/DSpace/issues/198">#198</a>)</li> <li>Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly (<a href="https://github.com/ilri/DSpace/issues/198">#198</a>)</li>
</ul> </ul>
<h2 id="20160323">2016-03-23</h2>
<h2 id="2016-03-23">2016-03-23</h2>
<ul> <ul>
<li><p>Abenet is having problems saving group memberships, and she gets this error: <a href="https://gist.github.com/alanorth/87281c061c2de57b773e">https://gist.github.com/alanorth/87281c061c2de57b773e</a></p> <li>Abenet is having problems saving group memberships, and she gets this error: <a href="https://gist.github.com/alanorth/87281c061c2de57b773e">https://gist.github.com/alanorth/87281c061c2de57b773e</a></li>
<pre><code>Can't find method org.dspace.app.xmlui.aspect.administrative.FlowGroupUtils.processSaveGroup(org.dspace.core.Context,number,string,[Ljava.lang.String;,[Ljava.lang.String;,org.apache.cocoon.environment.wrapper.RequestWrapper). (resource://aspects/Administrative/administrative.js#967)
</code></pre></li>
<li><p>I can reproduce the same error on DSpace Test and on my Mac</p></li>
<li><p>Looks to be an issue with the Atmire modules, I&rsquo;ve submitted a ticket to their tracker.</p></li>
</ul> </ul>
<pre><code>Can't find method org.dspace.app.xmlui.aspect.administrative.FlowGroupUtils.processSaveGroup(org.dspace.core.Context,number,string,[Ljava.lang.String;,[Ljava.lang.String;,org.apache.cocoon.environment.wrapper.RequestWrapper). (resource://aspects/Administrative/administrative.js#967)
<h2 id="2016-03-24">2016-03-24</h2> </code></pre><ul>
<li>I can reproduce the same error on DSpace Test and on my Mac</li>
<li>Looks to be an issue with the Atmire modules, I've submitted a ticket to their tracker.</li>
</ul>
<h2 id="20160324">2016-03-24</h2>
<ul> <ul>
<li>Atmire sent a patch for the group saving issue: <a href="https://github.com/ilri/DSpace/pull/201">https://github.com/ilri/DSpace/pull/201</a></li> <li>Atmire sent a patch for the group saving issue: <a href="https://github.com/ilri/DSpace/pull/201">https://github.com/ilri/DSpace/pull/201</a></li>
<li>I tested it locally and it works, so I merged it to <code>5_x-prod</code> and will deploy on CGSpace this week</li> <li>I tested it locally and it works, so I merged it to <code>5_x-prod</code> and will deploy on CGSpace this week</li>
</ul> </ul>
<h2 id="20160325">2016-03-25</h2>
<h2 id="2016-03-25">2016-03-25</h2>
<ul> <ul>
<li>Having problems with Listings and Reports, seems to be caused by a rogue reference to <code>dc.type.output</code></li> <li>Having problems with Listings and Reports, seems to be caused by a rogue reference to <code>dc.type.output</code></li>
<li>This is the error we get when we proceed to the second page of Listings and Reports: <a href="https://gist.github.com/alanorth/b2d7fb5b82f94898caaf">https://gist.github.com/alanorth/b2d7fb5b82f94898caaf</a></li> <li>This is the error we get when we proceed to the second page of Listings and Reports: <a href="https://gist.github.com/alanorth/b2d7fb5b82f94898caaf">https://gist.github.com/alanorth/b2d7fb5b82f94898caaf</a></li>
<li>Commenting out the line works, but I haven&rsquo;t figured out the proper syntax for referring to <code>dc.type.*</code></li> <li>Commenting out the line works, but I haven't figured out the proper syntax for referring to <code>dc.type.*</code></li>
</ul> </ul>
<h2 id="20160328">2016-03-28</h2>
<h2 id="2016-03-28">2016-03-28</h2>
<ul> <ul>
<li>Look into enabling the embargo during item submission, see: <a href="https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess">https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess</a></li> <li>Look into enabling the embargo during item submission, see: <a href="https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess">https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess</a></li>
<li>Seems we only want <code>AccessStep</code> because <code>UploadWithEmbargoStep</code> disables the ability to edit embargos at the item level</li> <li>Seems we only want <code>AccessStep</code> because <code>UploadWithEmbargoStep</code> disables the ability to edit embargos at the item level</li>
@ -334,9 +281,7 @@ metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | p
<li>This pull request simply updates the config for the dc.type.outputdc.type change that was made last week: <a href="https://github.com/ilri/DSpace/pull/204">https://github.com/ilri/DSpace/pull/204</a></li> <li>This pull request simply updates the config for the dc.type.outputdc.type change that was made last week: <a href="https://github.com/ilri/DSpace/pull/204">https://github.com/ilri/DSpace/pull/204</a></li>
<li>Deploy robots.txt fix, embargo for item submissions, and listings and reports fix on CGSpace</li> <li>Deploy robots.txt fix, embargo for item submissions, and listings and reports fix on CGSpace</li>
</ul> </ul>
<h2 id="20160329">2016-03-29</h2>
<h2 id="2016-03-29">2016-03-29</h2>
<ul> <ul>
<li>Skype meeting with Peter and Addis team to discuss metadata changes for Dublin Core, CGcore, and CGSpace-specific fields</li> <li>Skype meeting with Peter and Addis team to discuss metadata changes for Dublin Core, CGcore, and CGSpace-specific fields</li>
<li>We decided to proceed with some deletes first, then identify CGSpace-specific fields to clean/move to <code>cg.*</code>, and then worry about broader changes to DC</li> <li>We decided to proceed with some deletes first, then identify CGSpace-specific fields to clean/move to <code>cg.*</code>, and then worry about broader changes to DC</li>

View File

@ -8,11 +8,10 @@
<meta property="og:title" content="April, 2016" /> <meta property="og:title" content="April, 2016" />
<meta property="og:description" content="2016-04-04 <meta property="og:description" content="2016-04-04
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year! After running DSpace for over five years I&#39;ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we&rsquo;re paying for on S3 This will save us a few gigs of backup space we&#39;re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to: Also, I noticed the checker log has some errors we should pay attention to:
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -24,14 +23,13 @@ Also, I noticed the checker log has some errors we should pay attention to:
<meta name="twitter:title" content="April, 2016"/> <meta name="twitter:title" content="April, 2016"/>
<meta name="twitter:description" content="2016-04-04 <meta name="twitter:description" content="2016-04-04
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year! After running DSpace for over five years I&#39;ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we&rsquo;re paying for on S3 This will save us a few gigs of backup space we&#39;re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to: Also, I noticed the checker log has some errors we should pay attention to:
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -112,16 +110,14 @@ Also, I noticed the checker log has some errors we should pay attention to:
</p> </p>
</header> </header>
<h2 id="2016-04-04">2016-04-04</h2> <h2 id="20160404">2016-04-04</h2>
<ul> <ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul> </ul>
<pre><code>Run start time: 03/06/2016 04:00:22 <pre><code>Run start time: 03/06/2016 04:00:22
Error retrieving bitstream ID 71274 from asset store. Error retrieving bitstream ID 71274 from asset store.
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files) java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
@ -144,53 +140,40 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
****************************************************** ******************************************************
</code></pre> </code></pre><ul>
<ul>
<li>So this would be the <code>tomcat7</code> Unix user, who seems to have a default limit of 1024 files in its shell</li> <li>So this would be the <code>tomcat7</code> Unix user, who seems to have a default limit of 1024 files in its shell</li>
<li>For what it&rsquo;s worth, we have been setting the actual Tomcat 7 process&rsquo; limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li> <li>For what it's worth, we have been setting the actual Tomcat 7 process&rsquo; limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
<li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li> <li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li>
<li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<a href="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li> <li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<a href="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li>
</ul> </ul>
<h2 id="20160405">2016-04-05</h2>
<h2 id="2016-04-05">2016-04-05</h2>
<ul> <ul>
<li><p>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don&rsquo;t need!</p> <li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don't need!</li>
</ul>
<pre><code># s3cmd ls s3://cgspace.cgiar.org/log/ &gt; /tmp/s3-logs.txt <pre><code># s3cmd ls s3://cgspace.cgiar.org/log/ &gt; /tmp/s3-logs.txt
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
</code></pre></li> </code></pre><ul>
<li>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</li>
<li><p>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</p></li> <li>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</li>
<li><p>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</p></li>
</ul> </ul>
<h2 id="20160406">2016-04-06</h2>
<h2 id="2016-04-06">2016-04-06</h2>
<ul> <ul>
<li><p>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</p> <li>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</li>
</ul>
<pre><code>dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66; <pre><code>dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
UPDATE 40852 UPDATE 40852
</code></pre></li> </code></pre><ul>
<li>After that an <code>index-discovery -bf</code> is required</li>
<li><p>After that an <code>index-discovery -bf</code> is required</p></li> <li>Start working on metadata migrations, add 25 or so new metadata fields to CGSpace</li>
<li><p>Start working on metadata migrations, add 25 or so new metadata fields to CGSpace</p></li>
</ul> </ul>
<h2 id="20160407">2016-04-07</h2>
<h2 id="2016-04-07">2016-04-07</h2>
<ul> <ul>
<li>Write shell script to do the migration of fields: <a href="https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b">https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b</a></li> <li>Write shell script to do the migration of fields: <a href="https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b">https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b</a></li>
<li>Testing with a few fields it seems to work well:</li>
<li><p>Testing with a few fields it seems to work well:</p> </ul>
<pre><code>$ ./migrate-fields.sh <pre><code>$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66 UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40883 UPDATE 40883
@ -198,106 +181,75 @@ UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
UPDATE 21420 UPDATE 21420
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76 UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258 UPDATE 51258
</code></pre></li> </code></pre><h2 id="20160408">2016-04-08</h2>
</ul>
<h2 id="2016-04-08">2016-04-08</h2>
<ul> <ul>
<li>Discuss metadata renaming with Abenet, we decided it&rsquo;s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li> <li>Discuss metadata renaming with Abenet, we decided it's better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li>
<li>I&rsquo;ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li> <li>I've e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li>
</ul> </ul>
<h2 id="20160410">2016-04-10</h2>
<h2 id="2016-04-10">2016-04-10</h2>
<ul> <ul>
<li>Looking at the DOI issue <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li> <li>Looking at the DOI issue <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li>
<li>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</li>
<li><p>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</p> </ul>
<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%'; <pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
count count
------- -------
5638 5638
(1 row) (1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%'; dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
count count
------- -------
3 3
</code></pre></li> </code></pre><ul>
<li>I will manually edit the <code>dc.identifier.doi</code> in <a href="https://cgspace.cgiar.org/handle/10568/72509?show=full">10568/72509</a> and tweet the link, then check back in a week to see if the donut gets updated</li>
<li><p>I will manually edit the <code>dc.identifier.doi</code> in <a href="https://cgspace.cgiar.org/handle/10568/72509?show=full"><sup>10568</sup>&frasl;<sub>72509</sub></a> and tweet the link, then check back in a week to see if the donut gets updated</p></li>
</ul> </ul>
<h2 id="20160411">2016-04-11</h2>
<h2 id="2016-04-11">2016-04-11</h2>
<ul> <ul>
<li>The donut is already updated and shows the correct number now</li> <li>The donut is already updated and shows the correct number now</li>
<li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we&rsquo;d do it tentatively on Monday the 18th.</li> <li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we'd do it tentatively on Monday the 18th.</li>
</ul> </ul>
<h2 id="20160412">2016-04-12</h2>
<h2 id="2016-04-12">2016-04-12</h2>
<ul> <ul>
<li><p>Looking at quality of WLE data (<code>cg.subject.iwmi</code>) in SQL:</p> <li>Looking at quality of WLE data (<code>cg.subject.iwmi</code>) in SQL:</li>
</ul>
<pre><code>dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc; <pre><code>dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
</code></pre></li> </code></pre><ul>
<li>Listings and Reports is still not returning reliable data for <code>dc.type</code></li>
<li><p>Listings and Reports is still not returning reliable data for <code>dc.type</code></p></li> <li>I think we need to ask Atmire, as their documentation isn't too clear on the format of the filter configs</li>
<li>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</li>
<li><p>I think we need to ask Atmire, as their documentation isn&rsquo;t too clear on the format of the filter configs</p></li> <li>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</li>
<li>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<a href="https://github.com/ilri/DSpace/pull/216">#216</a>)</li>
<li><p>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</p></li> <li>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</li>
<li>I found 226 blank metadatavalues:</li>
<li><p>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</p></li> </ul>
<li><p>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<a href="https://github.com/ilri/DSpace/pull/216">#216</a>)</p></li>
<li><p>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</p></li>
<li><p>I found 226 blank metadatavalues:</p>
<pre><code>dspacetest# select * from metadatavalue where resource_type_id=2 and text_value=''; <pre><code>dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
</code></pre></li> </code></pre><ul>
<li>I think we should delete them and do a full re-index:</li>
<li><p>I think we should delete them and do a full re-index:</p> </ul>
<pre><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value=''; <pre><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 226 DELETE 226
</code></pre></li> </code></pre><ul>
<li>I deleted them on CGSpace but I'll wait to do the re-index as we're going to be doing one in a few days for the metadata changes anyways</li>
<li><p>I deleted them on CGSpace but I&rsquo;ll wait to do the re-index as we&rsquo;re going to be doing one in a few days for the metadata changes anyways</p></li> <li>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</li>
<li>Unfortunately this isn't a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn't very clear and I couldn't reach Atmire today</li>
<li><p>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</p></li> <li>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</li>
<li><p>Unfortunately this isn&rsquo;t a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn&rsquo;t very clear and I couldn&rsquo;t reach Atmire today</p></li>
<li><p>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</p></li>
</ul> </ul>
<h2 id="20160414">2016-04-14</h2>
<h2 id="2016-04-14">2016-04-14</h2>
<ul> <ul>
<li>Communicate with Macaroni Bros again about <code>dc.type</code></li> <li>Communicate with Macaroni Bros again about <code>dc.type</code></li>
<li>Help Sisay with some rsync and Linux stuff</li> <li>Help Sisay with some rsync and Linux stuff</li>
<li>Notify CIAT people of metadata changes (I had forgotten them last week)</li> <li>Notify CIAT people of metadata changes (I had forgotten them last week)</li>
</ul> </ul>
<h2 id="20160415">2016-04-15</h2>
<h2 id="2016-04-15">2016-04-15</h2>
<ul> <ul>
<li>DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code</li> <li>DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code</li>
</ul> </ul>
<h2 id="20160418">2016-04-18</h2>
<h2 id="2016-04-18">2016-04-18</h2>
<ul> <ul>
<li>Talk to CIAT people about their portal again</li> <li>Talk to CIAT people about their portal again</li>
<li>Start looking more at the fields we want to delete</li> <li>Start looking more at the fields we want to delete</li>
<li>The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc: <li>The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc:
<ul> <ul>
<li>dc.description.abstractother</li> <li>dc.description.abstractother</li>
<li>dc.whatwasknown</li> <li>dc.whatwasknown</li>
@ -305,10 +257,10 @@ DELETE 226
<li>dc.description.nationalpartners</li> <li>dc.description.nationalpartners</li>
<li>dc.peerreviewprocess</li> <li>dc.peerreviewprocess</li>
<li>cg.species.animal</li> <li>cg.species.animal</li>
</ul></li> </ul>
</li>
<li>Deleted!</li> <li>Deleted!</li>
<li>The following fields have some items using them and I have to decide what to do with them (delete or move): <li>The following fields have some items using them and I have to decide what to do with them (delete or move):
<ul> <ul>
<li>dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections</li> <li>dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections</li>
<li>dc.type.journal: 11 items, mostly in ILRI collections</li> <li>dc.type.journal: 11 items, mostly in ILRI collections</li>
@ -317,10 +269,10 @@ DELETE 226
<li>dc.Species.animal: 6 items, in ILRI and AnGR</li> <li>dc.Species.animal: 6 items, in ILRI and AnGR</li>
<li>cg.livestock.agegroup: 9 items, in ILRI collections</li> <li>cg.livestock.agegroup: 9 items, in ILRI collections</li>
<li>cg.livestock.function: 20 items, mostly in EADD</li> <li>cg.livestock.function: 20 items, mostly in EADD</li>
</ul></li> </ul>
</li>
<li><p>Test metadata migration on local instance again:</p> <li>Test metadata migration on local instance again:</li>
</ul>
<pre><code>$ ./migrate-fields.sh <pre><code>$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66 UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40885 UPDATE 40885
@ -335,98 +287,80 @@ UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108 UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075 UPDATE 46075
$ JAVA_OPTS=&quot;-Xms512m -Xmx512m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace index-discovery -bf $ JAVA_OPTS=&quot;-Xms512m -Xmx512m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace index-discovery -bf
</code></pre></li> </code></pre><ul>
<li>CGSpace was down but I'm not sure why, this was in <code>catalina.out</code>:</li>
<li><p>CGSpace was down but I&rsquo;m not sure why, this was in <code>catalina.out</code>:</p> </ul>
<pre><code>Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException <pre><code>Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error) SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException javax.ws.rs.WebApplicationException
at org.dspace.rest.Resource.processFinally(Resource.java:163) at org.dspace.rest.Resource.processFinally(Resource.java:163)
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81) at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source) at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
... ...
</code></pre></li> </code></pre><ul>
<li>Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)</li>
<li><p>Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)</p></li> <li>After restarting Tomcat a few more of these errors were logged but the application was up</li>
<li><p>After restarting Tomcat a few more of these errors were logged but the application was up</p></li>
</ul> </ul>
<h2 id="20160419">2016-04-19</h2>
<h2 id="2016-04-19">2016-04-19</h2>
<ul> <ul>
<li><p>Get handles for items that are using a given metadata field, ie <code>dc.Species.animal</code> (105):</p> <li>Get handles for items that are using a given metadata field, ie <code>dc.Species.animal</code> (105):</li>
</ul>
<pre><code># select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105); <pre><code># select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
handle handle
------------- -------------
10568/10298 10568/10298
10568/16413 10568/16413
10568/16774 10568/16774
10568/34487 10568/34487
</code></pre></li> </code></pre><ul>
<li>Delete metadata values for <code>dc.GRP</code> and <code>dc.icsubject.icrafsubject</code>:</li>
<li><p>Delete metadata values for <code>dc.GRP</code> and <code>dc.icsubject.icrafsubject</code>:</p> </ul>
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=96; <pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83; # delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
</code></pre></li> </code></pre><ul>
<li>They are old ICRAF fields and we haven't used them since 2011 or so</li>
<li><p>They are old ICRAF fields and we haven&rsquo;t used them since 2011 or so</p></li> <li>Also delete them from the metadata registry</li>
<li>CGSpace went down again, <code>dspace.log</code> had this:</li>
<li><p>Also delete them from the metadata registry</p></li> </ul>
<li><p>CGSpace went down again, <code>dspace.log</code> had this:</p>
<pre><code>2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - <pre><code>2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre></li> </code></pre><ul>
<li>I restarted Tomcat and PostgreSQL and now it's back up</li>
<li><p>I restarted Tomcat and PostgreSQL and now it&rsquo;s back up</p></li> <li>I bet this is the same crash as yesterday, but I only saw the errors in <code>catalina.out</code></li>
<li>Looks to be related to this, from <code>dspace.log</code>:</li>
<li><p>I bet this is the same crash as yesterday, but I only saw the errors in <code>catalina.out</code></p></li> </ul>
<li><p>Looks to be related to this, from <code>dspace.log</code>:</p>
<pre><code>2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement. <pre><code>2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
</code></pre></li> </code></pre><ul>
<li>We have 18,000 of these errors right now&hellip;</li>
<li><p>We have 18,000 of these errors right now&hellip;</p></li> <li>Delete a few more old metadata values: <code>dc.Species.animal</code>, <code>dc.type.journal</code>, and <code>dc.publicationcategory</code>:</li>
</ul>
<li><p>Delete a few more old metadata values: <code>dc.Species.animal</code>, <code>dc.type.journal</code>, and <code>dc.publicationcategory</code>:</p>
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=105; <pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85; # delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95; # delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
</code></pre></li> </code></pre><ul>
<li>And then remove them from the metadata registry</li>
<li><p>And then remove them from the metadata registry</p></li>
</ul> </ul>
<h2 id="20160420">2016-04-20</h2>
<h2 id="2016-04-20">2016-04-20</h2>
<ul> <ul>
<li>Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server</li> <li>Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server</li>
<li>Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server</li> <li>Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server</li>
<li>Field migration went well:</li>
<li><p>Field migration went well:</p> </ul>
<pre><code>$ ./migrate-fields.sh <pre><code>$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66 UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40909 UPDATE 40909
@ -440,62 +374,47 @@ UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872 UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108 UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075 UPDATE 46075
</code></pre></li> </code></pre><ul>
<li>Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well</li>
<li><p>Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well</p></li> <li>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</li>
<li>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</li>
<li><p>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</p></li> </ul>
<li><p>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</p>
<pre><code>$ grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-20 <pre><code>$ grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-20
21252 21252
</code></pre></li> </code></pre><ul>
<li>I found a recent discussion on the DSpace mailing list and I've asked for advice there</li>
<li><p>I found a recent discussion on the DSpace mailing list and I&rsquo;ve asked for advice there</p></li> <li>Looks like this issue was noted and fixed in DSpace 5.5 (we're on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></li>
<li>I've sent a message to Atmire asking about compatibility with DSpace 5.5</li>
<li><p>Looks like this issue was noted and fixed in DSpace 5.5 (we&rsquo;re on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></p></li>
<li><p>I&rsquo;ve sent a message to Atmire asking about compatibility with DSpace 5.5</p></li>
</ul> </ul>
<h2 id="20160421">2016-04-21</h2>
<h2 id="2016-04-21">2016-04-21</h2>
<ul> <ul>
<li>Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)</li> <li>Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)</li>
<li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I&rsquo;ll start testing those in a few weeks</li> <li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I'll start testing those in a few weeks</li>
</ul> </ul>
<h2 id="20160422">2016-04-22</h2>
<h2 id="2016-04-22">2016-04-22</h2>
<ul> <ul>
<li>Import 95 records into <a href="https://cgspace.cgiar.org/handle/10568/42219">CTA&rsquo;s Agrodok collection</a></li> <li>Import 95 records into <a href="https://cgspace.cgiar.org/handle/10568/42219">CTA's Agrodok collection</a></li>
</ul> </ul>
<h2 id="20160426">2016-04-26</h2>
<h2 id="2016-04-26">2016-04-26</h2>
<ul> <ul>
<li>Test embargo during item upload</li> <li>Test embargo during item upload</li>
<li>Seems to be working but the help text is misleading as to the date format</li> <li>Seems to be working but the help text is misleading as to the date format</li>
<li>It turns out the <code>robots.txt</code> issue we thought we solved last month isn&rsquo;t solved because you can&rsquo;t use wildcards in URL patterns: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>It turns out the <code>robots.txt</code> issue we thought we solved last month isn't solved because you can't use wildcards in URL patterns: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>Write some nginx rules to add <code>X-Robots-Tag</code> HTTP headers to the dynamic requests from <code>robots.txt</code> instead</li> <li>Write some nginx rules to add <code>X-Robots-Tag</code> HTTP headers to the dynamic requests from <code>robots.txt</code> instead</li>
<li>A few URLs to test with: <li>A few URLs to test with:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity">https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity">https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity</a></li>
<li><a href="https://dspacetest.cgiar.org/handle/10568/913/discover">https://dspacetest.cgiar.org/handle/10568/913/discover</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/913/discover">https://dspacetest.cgiar.org/handle/10568/913/discover</a></li>
<li><a href="https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&amp;filter_0=VIETNAM&amp;filter_relational_operator_0=equals&amp;field=country">https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&amp;filter_0=VIETNAM&amp;filter_relational_operator_0=equals&amp;field=country</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&amp;filter_0=VIETNAM&amp;filter_relational_operator_0=equals&amp;field=country">https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&amp;filter_0=VIETNAM&amp;filter_relational_operator_0=equals&amp;field=country</a></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2016-04-27">2016-04-27</h2> </ul>
<h2 id="20160427">2016-04-27</h2>
<ul> <ul>
<li>I woke up to ten or fifteen &ldquo;up&rdquo; and &ldquo;down&rdquo; emails from the monitoring website</li> <li>I woke up to ten or fifteen &ldquo;up&rdquo; and &ldquo;down&rdquo; emails from the monitoring website</li>
<li>Looks like the last one was &ldquo;down&rdquo; from about four hours ago</li> <li>Looks like the last one was &ldquo;down&rdquo; from about four hours ago</li>
<li>I think there must be something with this REST stuff:</li>
<li><p>I think there must be something with this REST stuff:</p> </ul>
<pre><code># grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-* <pre><code># grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-*
dspace.log.2016-04-01:0 dspace.log.2016-04-01:0
dspace.log.2016-04-02:0 dspace.log.2016-04-02:0
@ -524,40 +443,29 @@ dspace.log.2016-04-24:28775
dspace.log.2016-04-25:28626 dspace.log.2016-04-25:28626
dspace.log.2016-04-26:28655 dspace.log.2016-04-26:28655
dspace.log.2016-04-27:7271 dspace.log.2016-04-27:7271
</code></pre></li> </code></pre><ul>
<li>I restarted tomcat and it is back up</li>
<li><p>I restarted tomcat and it is back up</p></li> <li>Add Spanish XMLUI strings so those users see &ldquo;CGSpace&rdquo; instead of &ldquo;DSpace&rdquo; in the user interface (<a href="https://github.com/ilri/DSpace/pull/222">#222</a>)</li>
<li>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <a href="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></li>
<li><p>Add Spanish XMLUI strings so those users see &ldquo;CGSpace&rdquo; instead of &ldquo;DSpace&rdquo; in the user interface (<a href="https://github.com/ilri/DSpace/pull/222">#222</a>)</p></li> <li>Update infrastructure playbooks for nginx 1.10.x (stable) release: <a href="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></li>
<li>Currently running on DSpace Test, we'll give it a few days before we adjust CGSpace</li>
<li><p>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <a href="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></p></li> <li>CGSpace down, restarted tomcat and it's back up</li>
<li><p>Update infrastructure playbooks for nginx 1.10.x (stable) release: <a href="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></p></li>
<li><p>Currently running on DSpace Test, we&rsquo;ll give it a few days before we adjust CGSpace</p></li>
<li><p>CGSpace down, restarted tomcat and it&rsquo;s back up</p></li>
</ul> </ul>
<h2 id="20160428">2016-04-28</h2>
<h2 id="2016-04-28">2016-04-28</h2>
<ul> <ul>
<li>Problems with stability again. I&rsquo;ve blocked access to <code>/rest</code> for now to see if the number of errors in the log files drop</li> <li>Problems with stability again. I've blocked access to <code>/rest</code> for now to see if the number of errors in the log files drop</li>
<li>Later we could maybe start logging access to <code>/rest</code> and perhaps whitelist some IPs&hellip;</li> <li>Later we could maybe start logging access to <code>/rest</code> and perhaps whitelist some IPs&hellip;</li>
</ul> </ul>
<h2 id="20160430">2016-04-30</h2>
<h2 id="2016-04-30">2016-04-30</h2>
<ul> <ul>
<li><p>Logs for today and yesterday have zero references to this REST error, so I&rsquo;m going to open back up the REST API but log all requests</p> <li>Logs for today and yesterday have zero references to this REST error, so I'm going to open back up the REST API but log all requests</li>
</ul>
<pre><code>location /rest { <pre><code>location /rest {
access_log /var/log/nginx/rest.log; access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443; proxy_pass http://127.0.0.1:8443;
} }
</code></pre></li> </code></pre><ul>
<li>I will check the logs again in a few days to look for patterns, see who is accessing it, etc</li>
<li><p>I will check the logs again in a few days to look for patterns, see who is accessing it, etc</p></li>
</ul> </ul>

View File

@ -8,15 +8,12 @@
<meta property="og:title" content="May, 2016" /> <meta property="og:title" content="May, 2016" />
<meta property="og:description" content="2016-05-01 <meta property="og:description" content="2016-05-01
Since yesterday there have been 10,000 REST errors and the site has been unstable again Since yesterday there have been 10,000 REST errors and the site has been unstable again
I have blocked access to the API now I have blocked access to the API now
There are 3,000 IPs accessing the REST API in a 24-hour period! There are 3,000 IPs accessing the REST API in a 24-hour period!
# awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l # awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l
3168 3168
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-05/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-05/" />
@ -27,17 +24,14 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
<meta name="twitter:title" content="May, 2016"/> <meta name="twitter:title" content="May, 2016"/>
<meta name="twitter:description" content="2016-05-01 <meta name="twitter:description" content="2016-05-01
Since yesterday there have been 10,000 REST errors and the site has been unstable again Since yesterday there have been 10,000 REST errors and the site has been unstable again
I have blocked access to the API now I have blocked access to the API now
There are 3,000 IPs accessing the REST API in a 24-hour period! There are 3,000 IPs accessing the REST API in a 24-hour period!
# awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l # awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l
3168 3168
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,52 +112,38 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
</p> </p>
</header> </header>
<h2 id="2016-05-01">2016-05-01</h2> <h2 id="20160501">2016-05-01</h2>
<ul> <ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> <li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li> <li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168 3168
</code></pre></li> </code></pre><ul>
</ul>
<ul>
<li>The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29</li> <li>The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29</li>
<li>100% of the requests coming from Ethiopia are like this and result in an HTTP 500:</li>
<li><p>100% of the requests coming from Ethiopia are like this and result in an HTTP 500:</p>
<pre><code>GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1
</code></pre></li>
<li><p>For now I&rsquo;ll block just the Ethiopian IP</p></li>
<li><p>The owner of that application has said that the <code>NaN</code> (not a number) is an error in his code and he&rsquo;ll fix it</p></li>
</ul> </ul>
<pre><code>GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1
<h2 id="2016-05-03">2016-05-03</h2> </code></pre><ul>
<li>For now I'll block just the Ethiopian IP</li>
<li>The owner of that application has said that the <code>NaN</code> (not a number) is an error in his code and he'll fix it</li>
</ul>
<h2 id="20160503">2016-05-03</h2>
<ul> <ul>
<li>Update nginx to 1.10.x branch on CGSpace</li> <li>Update nginx to 1.10.x branch on CGSpace</li>
<li>Fix a reference to <code>dc.type.output</code> in Discovery that I had missed when we migrated to <code>dc.type</code> last month (<a href="https://github.com/ilri/DSpace/pull/223">#223</a>)</li> <li>Fix a reference to <code>dc.type.output</code> in Discovery that I had missed when we migrated to <code>dc.type</code> last month (<a href="https://github.com/ilri/DSpace/pull/223">#223</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/05/discovery-types.png" alt="Item type in Discovery results"></p>
<p><img src="/cgspace-notes/2016/05/discovery-types.png" alt="Item type in Discovery results" /></p> <h2 id="20160506">2016-05-06</h2>
<h2 id="2016-05-06">2016-05-06</h2>
<ul> <ul>
<li>DSpace Test is down, <code>catalina.out</code> has lots of messages about heap space from some time yesterday (!)</li> <li>DSpace Test is down, <code>catalina.out</code> has lots of messages about heap space from some time yesterday (!)</li>
<li>It looks like Sisay was doing some batch imports</li> <li>It looks like Sisay was doing some batch imports</li>
<li>Hmm, also disk space is full</li> <li>Hmm, also disk space is full</li>
<li>I decided to blow away the solr indexes, since they are 50GB and we don&rsquo;t really need all the Atmire stuff there right now</li> <li>I decided to blow away the solr indexes, since they are 50GB and we don't really need all the Atmire stuff there right now</li>
<li>I will re-generate the Discovery indexes after re-deploying</li> <li>I will re-generate the Discovery indexes after re-deploying</li>
<li>Testing <code>renew-letsencrypt.sh</code> script for nginx</li>
<li><p>Testing <code>renew-letsencrypt.sh</code> script for nginx</p> </ul>
<pre><code>#!/usr/bin/env bash <pre><code>#!/usr/bin/env bash
readonly SERVICE_BIN=/usr/sbin/service readonly SERVICE_BIN=/usr/sbin/service
@ -179,37 +159,32 @@ LE_RESULT=$?
$SERVICE_BIN nginx start $SERVICE_BIN nginx start
if [[ &quot;$LE_RESULT&quot; != 0 ]]; then if [[ &quot;$LE_RESULT&quot; != 0 ]]; then
echo 'Automated renewal failed:' echo 'Automated renewal failed:'
cat /var/log/letsencrypt/renew.log cat /var/log/letsencrypt/renew.log
exit 1 exit 1
fi fi
</code></pre></li> </code></pre><ul>
<li>Seems to work well</li>
<li><p>Seems to work well</p></li>
</ul> </ul>
<h2 id="20160510">2016-05-10</h2>
<h2 id="2016-05-10">2016-05-10</h2>
<ul> <ul>
<li>Start looking at more metadata migrations</li> <li>Start looking at more metadata migrations</li>
<li>There are lots of fields in <code>dcterms</code> namespace that look interesting, like: <li>There are lots of fields in <code>dcterms</code> namespace that look interesting, like:
<ul> <ul>
<li>dcterms.type</li> <li>dcterms.type</li>
<li>dcterms.spatial</li> <li>dcterms.spatial</li>
</ul></li> </ul>
</li>
<li>Not sure what <code>dcterms</code> is&hellip;</li> <li>Not sure what <code>dcterms</code> is&hellip;</li>
<li>Looks like these were <a href="https://wiki.duraspace.org/display/DSDOC5x/Metadata+and+Bitstream+Format+Registries#MetadataandBitstreamFormatRegistries-DublinCoreTermsRegistry(DCTERMS)">added in DSpace 4</a> to allow for future work to make DSpace more flexible</li> <li>Looks like these were <a href="https://wiki.duraspace.org/display/DSDOC5x/Metadata+and+Bitstream+Format+Registries#MetadataandBitstreamFormatRegistries-DublinCoreTermsRegistry(DCTERMS)">added in DSpace 4</a> to allow for future work to make DSpace more flexible</li>
<li>CGSpace&rsquo;s <code>dc</code> registry has 96 items, and the default DSpace one has 73.</li> <li>CGSpace's <code>dc</code> registry has 96 items, and the default DSpace one has 73.</li>
</ul> </ul>
<h2 id="20160511">2016-05-11</h2>
<h2 id="2016-05-11">2016-05-11</h2>
<ul> <ul>
<li><p>Identify and propose the next phase of CGSpace fields to migrate:</p> <li>
<p>Identify and propose the next phase of CGSpace fields to migrate:</p>
<ul> <ul>
<li>dc.title.jtitle → cg.title.journal</li> <li>dc.title.jtitle → cg.title.journal</li>
<li>dc.identifier.status → cg.identifier.status</li> <li>dc.identifier.status → cg.identifier.status</li>
@ -219,182 +194,138 @@ fi
<li>dc.fulltextstatus → cg.fulltextstatus</li> <li>dc.fulltextstatus → cg.fulltextstatus</li>
<li>dc.editon → cg.edition</li> <li>dc.editon → cg.edition</li>
<li>dc.isijournal → cg.isijournal</li> <li>dc.isijournal → cg.isijournal</li>
</ul></li>
<li><p>Start a test rebase of the <code>5_x-prod</code> branch on top of the <code>dspace-5.5</code> tag</p></li>
<li><p>There were a handful of conflicts that I didn&rsquo;t understand</p></li>
<li><p>After completing the rebase I tried to build with the module versions Atmire had indicated as being 5.5 ready but I got this error:</p>
<pre><code>[ERROR] Failed to execute goal on project additions: Could not resolve dependencies for project org.dspace.modules:additions:jar:5.5: Could not find artifact com.atmire:atmire-metadata-quality-api:jar:5.5-2.10.1-0 in sonatype-releases (https://oss.sonatype.org/content/repositories/releases/) -&gt; [Help 1]
</code></pre></li>
<li><p>I&rsquo;ve sent them a question about it</p></li>
<li><p>A user mentioned having problems with uploading a 33 MB PDF</p></li>
<li><p>I told her I would increase the limit temporarily tomorrow morning</p></li>
<li><p>Turns out she was able to decrease the size of the PDF so we didn&rsquo;t have to do anything</p></li>
</ul> </ul>
</li>
<h2 id="2016-05-12">2016-05-12</h2> <li>
<p>Start a test rebase of the <code>5_x-prod</code> branch on top of the <code>dspace-5.5</code> tag</p>
</li>
<li>
<p>There were a handful of conflicts that I didn't understand</p>
</li>
<li>
<p>After completing the rebase I tried to build with the module versions Atmire had indicated as being 5.5 ready but I got this error:</p>
</li>
</ul>
<pre><code>[ERROR] Failed to execute goal on project additions: Could not resolve dependencies for project org.dspace.modules:additions:jar:5.5: Could not find artifact com.atmire:atmire-metadata-quality-api:jar:5.5-2.10.1-0 in sonatype-releases (https://oss.sonatype.org/content/repositories/releases/) -&gt; [Help 1]
</code></pre><ul>
<li>I've sent them a question about it</li>
<li>A user mentioned having problems with uploading a 33 MB PDF</li>
<li>I told her I would increase the limit temporarily tomorrow morning</li>
<li>Turns out she was able to decrease the size of the PDF so we didn't have to do anything</li>
</ul>
<h2 id="20160512">2016-05-12</h2>
<ul> <ul>
<li>Looks like the issue that Abenet was having a few days ago with &ldquo;Connection Reset&rdquo; in Firefox might be due to a Firefox 46 issue: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1268775">https://bugzilla.mozilla.org/show_bug.cgi?id=1268775</a></li> <li>Looks like the issue that Abenet was having a few days ago with &ldquo;Connection Reset&rdquo; in Firefox might be due to a Firefox 46 issue: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1268775">https://bugzilla.mozilla.org/show_bug.cgi?id=1268775</a></li>
<li>I finally found a copy of the latest CG Core metadata guidelines and it looks like we can add a few more fields to our next migration: <li>I finally found a copy of the latest CG Core metadata guidelines and it looks like we can add a few more fields to our next migration:
<ul> <ul>
<li>dc.rplace.region → cg.coverage.region</li> <li>dc.rplace.region → cg.coverage.region</li>
<li>dc.cplace.country → cg.coverage.country</li> <li>dc.cplace.country → cg.coverage.country</li>
</ul></li> </ul>
</li>
<li>Questions for CG people: <li>Questions for CG people:
<ul> <ul>
<li>Our <code>dc.place</code> and <code>dc.srplace.subregion</code> could both map to <code>cg.coverage.admin-unit</code>?</li> <li>Our <code>dc.place</code> and <code>dc.srplace.subregion</code> could both map to <code>cg.coverage.admin-unit</code>?</li>
<li>Should we use <code>dc.contributor.crp</code> or <code>cg.contributor.crp</code> for the CRP (ours is <code>dc.crsubject.crpsubject</code>)?</li> <li>Should we use <code>dc.contributor.crp</code> or <code>cg.contributor.crp</code> for the CRP (ours is <code>dc.crsubject.crpsubject</code>)?</li>
<li>Our <code>dc.contributor.affiliation</code> and <code>dc.contributor.corporate</code> could both map to <code>dc.contributor</code> and possibly <code>dc.contributor.center</code> depending on if it&rsquo;s a CG center or not</li> <li>Our <code>dc.contributor.affiliation</code> and <code>dc.contributor.corporate</code> could both map to <code>dc.contributor</code> and possibly <code>dc.contributor.center</code> depending on if it's a CG center or not</li>
<li><code>dc.title.jtitle</code> could either map to <code>dc.publisher</code> or <code>dc.source</code> depending on how you read things</li> <li><code>dc.title.jtitle</code> could either map to <code>dc.publisher</code> or <code>dc.source</code> depending on how you read things</li>
</ul></li>
<li><p>Found ~200 messed up CIAT values in <code>dc.publisher</code>:</p>
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to &quot;% %&quot;;
</code></pre></li>
</ul> </ul>
</li>
<h2 id="2016-05-13">2016-05-13</h2> <li>Found ~200 messed up CIAT values in <code>dc.publisher</code>:</li>
</ul>
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to &quot;% %&quot;;
</code></pre><h2 id="20160513">2016-05-13</h2>
<ul> <ul>
<li>More theorizing about CGcore</li> <li>More theorizing about CGcore</li>
<li>Add two new fields: <li>Add two new fields:
<ul> <ul>
<li>dc.srplace.subregion → cg.coverage.admin-unit</li> <li>dc.srplace.subregion → cg.coverage.admin-unit</li>
<li>dc.place → cg.place</li> <li>dc.place → cg.place</li>
</ul></li>
<li><code>dc.place</code> is our own field, so it&rsquo;s easy to move</li>
<li>I&rsquo;ve removed <code>dc.title.jtitle</code> from the list for now because there&rsquo;s no use moving it out of DC until we know where it will go (see discussion yesterday)</li>
</ul> </ul>
</li>
<h2 id="2016-05-18">2016-05-18</h2> <li><code>dc.place</code> is our own field, so it's easy to move</li>
<li>I've removed <code>dc.title.jtitle</code> from the list for now because there's no use moving it out of DC until we know where it will go (see discussion yesterday)</li>
</ul>
<h2 id="20160518">2016-05-18</h2>
<ul> <ul>
<li>Work on 707 CCAFS records</li> <li>Work on 707 CCAFS records</li>
<li>They have thumbnails on Flickr and elsewhere</li> <li>They have thumbnails on Flickr and elsewhere</li>
<li>In OpenRefine I created a new <code>filename</code> column based on the <code>thumbnail</code> column with the following GREL:</li>
<li><p>In OpenRefine I created a new <code>filename</code> column based on the <code>thumbnail</code> column with the following GREL:</p> </ul>
<pre><code>if(cells['thumbnails'].value.contains('hqdefault'), cells['thumbnails'].value.split('/')[-2] + '.jpg', cells['thumbnails'].value.split('/')[-1]) <pre><code>if(cells['thumbnails'].value.contains('hqdefault'), cells['thumbnails'].value.split('/')[-2] + '.jpg', cells['thumbnails'].value.split('/')[-1])
</code></pre></li> </code></pre><ul>
<li>Because ~400 records had the same filename on Flickr (hqdefault.jpg) but different UUIDs in the URL</li>
<li><p>Because ~400 records had the same filename on Flickr (hqdefault.jpg) but different UUIDs in the URL</p></li> <li>So for the <code>hqdefault.jpg</code> ones I just take the UUID (-2) and use it as the filename</li>
<li>Before importing with SAFBuilder I tested adding &ldquo;__bundle:THUMBNAIL&rdquo; to the <code>filename</code> column and it works fine</li>
<li><p>So for the <code>hqdefault.jpg</code> ones I just take the UUID (-2) and use it as the filename</p></li>
<li><p>Before importing with SAFBuilder I tested adding &ldquo;__bundle:THUMBNAIL&rdquo; to the <code>filename</code> column and it works fine</p></li>
</ul> </ul>
<h2 id="20160519">2016-05-19</h2>
<h2 id="2016-05-19">2016-05-19</h2>
<ul> <ul>
<li><p>More quality control on <code>filename</code> field of CCAFS records to make processing in shell and SAFBuilder more reliable:</p> <li>More quality control on <code>filename</code> field of CCAFS records to make processing in shell and SAFBuilder more reliable:</li>
</ul>
<pre><code>value.replace('_','').replace('-','') <pre><code>value.replace('_','').replace('-','')
</code></pre></li> </code></pre><ul>
<li>We need to hold off on moving <code>dc.Species</code> to <code>cg.species</code> because it is only used for plants, and might be better to move it to something like <code>cg.species.plant</code></li>
<li><p>We need to hold off on moving <code>dc.Species</code> to <code>cg.species</code> because it is only used for plants, and might be better to move it to something like <code>cg.species.plant</code></p></li> <li>And <code>dc.identifier.fund</code> is MOSTLY used for CPWF project identifier but has some other sponsorship things
<li><p>And <code>dc.identifier.fund</code> is MOSTLY used for CPWF project identifier but has some other sponsorship things</p>
<ul> <ul>
<li>We should move PN<em>, SG</em>, CBA, IA, and PHASE* values to <code>cg.identifier.cpwfproject</code></li> <li>We should move PN*, SG*, CBA, IA, and PHASE* values to <code>cg.identifier.cpwfproject</code></li>
<li>The rest, like BMGF and USAID etc, might have to go to either <code>dc.description.sponsorship</code> or <code>cg.identifier.fund</code> (not sure yet)</li> <li>The rest, like BMGF and USAID etc, might have to go to either <code>dc.description.sponsorship</code> or <code>cg.identifier.fund</code> (not sure yet)</li>
<li>There are also some mistakes in CPWF&rsquo;s things, like &ldquo;PN 47&rdquo;</li> <li>There are also some mistakes in CPWF's things, like &ldquo;PN 47&rdquo;</li>
<li>This ought to catch all the CPWF values (there don't appear to be and SG* values):</li>
<li><p>This ought to catch all the CPWF values (there don&rsquo;t appear to be and SG* values):</p>
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
</code></pre></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2016-05-20">2016-05-20</h2> </ul>
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
</code></pre><h2 id="20160520">2016-05-20</h2>
<ul> <ul>
<li>More work on CCAFS Video and Images records</li> <li>More work on CCAFS Video and Images records</li>
<li>For SAFBuilder we need to modify filename column to have the thumbnail bundle:</li>
<li><p>For SAFBuilder we need to modify filename column to have the thumbnail bundle:</p>
<pre><code>value + &quot;__bundle:THUMBNAIL&quot;
</code></pre></li>
<li><p>Also, I fixed some weird characters using OpenRefine&rsquo;s transform with the following GREL:</p>
<pre><code>value.replace(/\u0081/,'')
</code></pre></li>
<li><p>Write shell script to resize thumbnails with height larger than 400: <a href="https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256">https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256</a></p></li>
<li><p>Upload 707 CCAFS records to DSpace Test</p></li>
<li><p>A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target <code>_black</code>): <a href="https://github.com/ilri/DSpace/pull/224">#224</a></p></li>
<li><p>Work on configuration changes for Phase 2 metadata migrations</p></li>
</ul> </ul>
<pre><code>value + &quot;__bundle:THUMBNAIL&quot;
<h2 id="2016-05-23">2016-05-23</h2> </code></pre><ul>
<li>Also, I fixed some weird characters using OpenRefine's transform with the following GREL:</li>
</ul>
<pre><code>value.replace(/\u0081/,'')
</code></pre><ul>
<li>Write shell script to resize thumbnails with height larger than 400: <a href="https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256">https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256</a></li>
<li>Upload 707 CCAFS records to DSpace Test</li>
<li>A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target <code>_black</code>): <a href="https://github.com/ilri/DSpace/pull/224">#224</a></li>
<li>Work on configuration changes for Phase 2 metadata migrations</li>
</ul>
<h2 id="20160523">2016-05-23</h2>
<ul> <ul>
<li>Try to import the CCAFS Images and Videos to CGSpace but had some issues with LibreOffice and OpenRefine</li> <li>Try to import the CCAFS Images and Videos to CGSpace but had some issues with LibreOffice and OpenRefine</li>
<li>LibreOffice excludes empty cells when it exports and all the fields shift over to the left and cause URLs to go to Subjects, etc.</li> <li>LibreOffice excludes empty cells when it exports and all the fields shift over to the left and cause URLs to go to Subjects, etc.</li>
<li>Google Docs does this better, but somehow reorders the rows and when I paste the thumbnail/filename row in they don&rsquo;t match!</li> <li>Google Docs does this better, but somehow reorders the rows and when I paste the thumbnail/filename row in they don't match!</li>
<li>I will have to try later</li> <li>I will have to try later</li>
</ul> </ul>
<h2 id="20160530">2016-05-30</h2>
<h2 id="2016-05-30">2016-05-30</h2>
<ul> <ul>
<li><p>Export CCAFS video and image records from DSpace Test using the migrate option (<code>-m</code>):</p> <li>Export CCAFS video and image records from DSpace Test using the migrate option (<code>-m</code>):</li>
</ul>
<pre><code>$ mkdir ~/ccafs-images <pre><code>$ mkdir ~/ccafs-images
$ /home/dspacetest.cgiar.org/bin/dspace export -t COLLECTION -i 10568/79355 -d ~/ccafs-images -n 0 -m $ /home/dspacetest.cgiar.org/bin/dspace export -t COLLECTION -i 10568/79355 -d ~/ccafs-images -n 0 -m
</code></pre></li> </code></pre><ul>
<li>And then import to CGSpace:</li>
<li><p>And then import to CGSpace:</p> </ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/70974 --source /tmp/ccafs-images --mapfile=/tmp/ccafs-images-may30.map &amp;&gt; /tmp/ccafs-images-may30.log <pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/70974 --source /tmp/ccafs-images --mapfile=/tmp/ccafs-images-may30.map &amp;&gt; /tmp/ccafs-images-may30.log
</code></pre></li> </code></pre><ul>
<li>But now we have double authors for &ldquo;CGIAR Research Program on Climate Change, Agriculture and Food Security&rdquo; in the authority</li>
<li><p>But now we have double authors for &ldquo;CGIAR Research Program on Climate Change, Agriculture and Food Security&rdquo; in the authority</p></li> <li>I'm trying to do a Discovery index before messing with the authority index</li>
<li>Looks like we are missing the <code>index-authority</code> cron job, so who knows what's up with our authority index</li>
<li><p>I&rsquo;m trying to do a Discovery index before messing with the authority index</p></li> <li>Run system updates on DSpace Test, re-deploy code, and reboot the server</li>
<li>Clean up and import ~200 CTA records to CGSpace via CSV like:</li>
<li><p>Looks like we are missing the <code>index-authority</code> cron job, so who knows what&rsquo;s up with our authority index</p></li> </ul>
<li><p>Run system updates on DSpace Test, re-deploy code, and reboot the server</p></li>
<li><p>Clean up and import ~200 CTA records to CGSpace via CSV like:</p>
<pre><code>$ export JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; <pre><code>$ export JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot;
$ /home/cgspace.cgiar.org/bin/dspace metadata-import -e aorth@mjanja.ch -f ~/CTA-May30/CTA-42229.csv &amp;&gt; ~/CTA-May30/CTA-42229.log $ /home/cgspace.cgiar.org/bin/dspace metadata-import -e aorth@mjanja.ch -f ~/CTA-May30/CTA-42229.csv &amp;&gt; ~/CTA-May30/CTA-42229.log
</code></pre></li> </code></pre><ul>
<li>Discovery indexing took a few hours for some reason, and after that I started the <code>index-authority</code> script</li>
<li><p>Discovery indexing took a few hours for some reason, and after that I started the <code>index-authority</code> script</p>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace index-authority
</code></pre></li>
</ul> </ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace index-authority
<h2 id="2016-05-31">2016-05-31</h2> </code></pre><h2 id="20160531">2016-05-31</h2>
<ul> <ul>
<li>The <code>index-authority</code> script ran over night and was finished in the morning</li> <li>The <code>index-authority</code> script ran over night and was finished in the morning</li>
<li>Hopefully this was because we haven&rsquo;t been running it regularly and it will speed up next time</li> <li>Hopefully this was because we haven't been running it regularly and it will speed up next time</li>
<li>I am running it again with a timer to see:</li>
<li><p>I am running it again with a timer to see:</p> </ul>
<pre><code>$ time /home/cgspace.cgiar.org/bin/dspace index-authority <pre><code>$ time /home/cgspace.cgiar.org/bin/dspace index-authority
Retrieving all data Retrieving all data
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
@ -405,17 +336,12 @@ All done !
real 37m26.538s real 37m26.538s
user 2m24.627s user 2m24.627s
sys 0m20.540s sys 0m20.540s
</code></pre></li> </code></pre><ul>
<li>Update <code>tomcat7</code> crontab on CGSpace and DSpace Test to have the <code>index-authority</code> script that we were missing</li>
<li><p>Update <code>tomcat7</code> crontab on CGSpace and DSpace Test to have the <code>index-authority</code> script that we were missing</p></li> <li>Add new ILRI subject and CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/226">#226</a>, <a href="https://github.com/ilri/DSpace/pull/225">#225</a>)</li>
<li>Manually mapped the authors of a few old CCAFS records to the new CCAFS authority UUID and re-indexed authority indexes to see if it helps correct those items.</li>
<li><p>Add new ILRI subject and CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/226">#226</a>, <a href="https://github.com/ilri/DSpace/pull/225">#225</a>)</p></li> <li>Re-sync DSpace Test data with CGSpace</li>
<li>Clean up and import ~65 more CTA items into CGSpace</li>
<li><p>Manually mapped the authors of a few old CCAFS records to the new CCAFS authority UUID and re-indexed authority indexes to see if it helps correct those items.</p></li>
<li><p>Re-sync DSpace Test data with CGSpace</p></li>
<li><p>Clean up and import ~65 more CTA items into CGSpace</p></li>
</ul> </ul>

View File

@ -8,9 +8,8 @@
<meta property="og:title" content="June, 2016" /> <meta property="og:title" content="June, 2016" />
<meta property="og:description" content="2016-06-01 <meta property="og:description" content="2016-06-01
Experimenting with IFPRI OAI (we want to harvest their publications) Experimenting with IFPRI OAI (we want to harvest their publications)
After reading the ContentDM documentation I found IFPRI&rsquo;s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the ContentDM documentation I found IFPRI&#39;s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php
After reading the OAI documentation and testing with an OAI validator I found out how to get their publications After reading the OAI documentation and testing with an OAI validator I found out how to get their publications
This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
@ -25,15 +24,14 @@ Working on second phase of metadata migration, looks like this will work for mov
<meta name="twitter:title" content="June, 2016"/> <meta name="twitter:title" content="June, 2016"/>
<meta name="twitter:description" content="2016-06-01 <meta name="twitter:description" content="2016-06-01
Experimenting with IFPRI OAI (we want to harvest their publications) Experimenting with IFPRI OAI (we want to harvest their publications)
After reading the ContentDM documentation I found IFPRI&rsquo;s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the ContentDM documentation I found IFPRI&#39;s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php
After reading the OAI documentation and testing with an OAI validator I found out how to get their publications After reading the OAI documentation and testing with an OAI validator I found out how to get their publications
This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -114,300 +112,240 @@ Working on second phase of metadata migration, looks like this will work for mov
</p> </p>
</header> </header>
<h2 id="2016-06-01">2016-06-01</h2> <h2 id="20160601">2016-06-01</h2>
<ul> <ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> <li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
</ul> </ul>
<pre><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); <pre><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
UPDATE 497 UPDATE 497
dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75;
UPDATE 14 UPDATE 14
</code></pre> </code></pre><ul>
<ul>
<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li> <li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li>
</ul> </ul>
<h2 id="20160602">2016-06-02</h2>
<h2 id="2016-06-02">2016-06-02</h2>
<ul> <ul>
<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li> <li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li>
<li>Seems that the Browse configuration in <code>dspace.cfg</code> can't handle the &lsquo;-&rsquo; in the field name:</li>
<li><p>Seems that the Browse configuration in <code>dspace.cfg</code> can&rsquo;t handle the &lsquo;-&rsquo; in the field name:</p>
<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
</code></pre></li>
<li><p>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</p></li>
<li><p>I&rsquo;ve sent a message to the DSpace mailing list to ask about the Browse index definition</p></li>
<li><p>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</p></li>
<li><p>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></p></li>
<li><p>The patch applies successfully on DSpace 5.1 so I will try it later</p></li>
</ul> </ul>
<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
<h2 id="2016-06-03">2016-06-03</h2> </code></pre><ul>
<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li>
<li>I've sent a message to the DSpace mailing list to ask about the Browse index definition</li>
<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li>
<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li>
<li>The patch applies successfully on DSpace 5.1 so I will try it later</li>
</ul>
<h2 id="20160603">2016-06-03</h2>
<ul> <ul>
<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li> <li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li>
<li>The top two authors are:</li>
<li><p>The top two authors are:</p> </ul>
<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 <pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600
</code></pre></li> </code></pre><ul>
<li>So the only difference is the &ldquo;confidence&rdquo;</li>
<li><p>So the only difference is the &ldquo;confidence&rdquo;</p></li> <li>Ok, well THAT is interesting:</li>
</ul>
<li><p>Ok, well THAT is interesting:</p>
<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; <pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence text_value | authority | confidence
------------+--------------------------------------+------------ ------------+--------------------------------------+------------
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, Alan | | -1 Orth, Alan | | -1
Orth, Alan | | -1 Orth, Alan | | -1
Orth, Alan | | -1 Orth, Alan | | -1
Orth, Alan | | -1 Orth, Alan | | -1
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
(13 rows) (13 rows)
</code></pre></li> </code></pre><ul>
<li>And now an actually relevent example:</li>
<li><p>And now an actually relevent example:</p> </ul>
<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500; <pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500;
count count
------- -------
707 707
(1 row) (1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500; dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500;
count count
------- -------
253 253
(1 row) (1 row)
</code></pre></li> </code></pre><ul>
<li>Trying something experimental:</li>
<li><p>Trying something experimental:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; <pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
UPDATE 960 UPDATE 960
</code></pre></li> </code></pre><ul>
<li>And then re-indexing authority and Discovery&hellip;?</li>
<li><p>And then re-indexing authority and Discovery&hellip;?</p></li> <li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li>
<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li>
<li><p>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</p></li>
<li><p>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</p>
<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
</code></pre></li>
<li><p>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we&rsquo;ll have to see what effect that has later</p></li>
</ul> </ul>
<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
<h2 id="2016-06-04">2016-06-04</h2> </code></pre><ul>
<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we'll have to see what effect that has later</li>
</ul>
<h2 id="20160604">2016-06-04</h2>
<ul> <ul>
<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li> <li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li>
<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li> <li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li>
<li>Run all system updates and reboot CGSpace server</li> <li>Run all system updates and reboot CGSpace server</li>
</ul> </ul>
<h2 id="20160607">2016-06-07</h2>
<h2 id="2016-06-07">2016-06-07</h2>
<ul> <ul>
<li><p>Figured out how to export a list of the unique values from a metadata field ordered by count:</p> <li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
</code></pre></li>
<li><p>Identified the next round of fields to migrate:</p>
<ul>
<li>dc.title.jtitle → dc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li>
<li>dc.contributor.affiliation → cg.contributor.affiliation</li>
<li>dc.Species → cg.species</li>
<li>dc.contributor.corporate → dc.contributor</li>
<li>dc.identifier.url → cg.identifier.url</li>
<li>dc.identifier.doi → cg.identifier.doi</li>
<li>dc.identifier.googleurl → cg.identifier.googleurl</li>
<li>dc.identifier.dataurl → cg.identifier.dataurl</li>
</ul></li>
<li><p>Discuss pulling data from IFPRI&rsquo;s ContentDM with Ryan Miller</p></li>
<li><p>Looks like OAI is kinda obtuse for this, and if we use ContentDM&rsquo;s API we&rsquo;ll be able to access their internal field names (rather than trying to figure out how they stuffed them into various, repeated Dublin Core fields)</p></li>
</ul> </ul>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
<h2 id="2016-06-08">2016-06-08</h2> </code></pre><ul>
<li>
<p>Identified the next round of fields to migrate:</p>
<ul>
<li>dc.title.jtitle → dc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li>
<li>dc.contributor.affiliation → cg.contributor.affiliation</li>
<li>dc.Species → cg.species</li>
<li>dc.contributor.corporate → dc.contributor</li>
<li>dc.identifier.url → cg.identifier.url</li>
<li>dc.identifier.doi → cg.identifier.doi</li>
<li>dc.identifier.googleurl → cg.identifier.googleurl</li>
<li>dc.identifier.dataurl → cg.identifier.dataurl</li>
</ul>
</li>
<li>
<p>Discuss pulling data from IFPRI's ContentDM with Ryan Miller</p>
</li>
<li>
<p>Looks like OAI is kinda obtuse for this, and if we use ContentDM's API we'll be able to access their internal field names (rather than trying to figure out how they stuffed them into various, repeated Dublin Core fields)</p>
</li>
</ul>
<h2 id="20160608">2016-06-08</h2>
<ul> <ul>
<li>Discuss controlled vocabularies for ~28 fields</li> <li>Discuss controlled vocabularies for ~28 fields</li>
<li>Looks like this is all we need: <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies">https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies</a></li> <li>Looks like this is all we need: <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies">https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies</a></li>
<li>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (uses xmlstartlet):</li>
<li><p>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (uses xmlstartlet):</p>
<pre><code>$ xml sel -t -m '//value-pairs[@value-pairs-name=&quot;ilrisubject&quot;]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
</code></pre></li>
<li><p>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</p></li>
<li><p>Seems to be a virtual field that is queried from the authority cache&hellip; hmm</p></li>
<li><p>In other news, I found out that the About page that we haven&rsquo;t been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</p></li>
<li><p>File bug about <code>closed=&quot;true&quot;</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></p></li>
</ul> </ul>
<pre><code>$ xml sel -t -m '//value-pairs[@value-pairs-name=&quot;ilrisubject&quot;]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
<h2 id="2016-06-09">2016-06-09</h2> </code></pre><ul>
<li>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</li>
<li>Seems to be a virtual field that is queried from the authority cache&hellip; hmm</li>
<li>In other news, I found out that the About page that we haven't been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</li>
<li>File bug about <code>closed=&quot;true&quot;</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></li>
</ul>
<h2 id="20160609">2016-06-09</h2>
<ul> <ul>
<li>Atmire explained that the <code>atmire.orcid.id</code> field doesn&rsquo;t exist in the schema, as it actually comes from the authority cache during XMLUI run time</li> <li>Atmire explained that the <code>atmire.orcid.id</code> field doesn't exist in the schema, as it actually comes from the authority cache during XMLUI run time</li>
<li>This means we don&rsquo;t see it when harvesting via OAI or REST, for example</li> <li>This means we don't see it when harvesting via OAI or REST, for example</li>
<li>They opened a feature ticket on the DSpace tracker to ask for support of this: <a href="https://jira.duraspace.org/browse/DS-3239">https://jira.duraspace.org/browse/DS-3239</a></li> <li>They opened a feature ticket on the DSpace tracker to ask for support of this: <a href="https://jira.duraspace.org/browse/DS-3239">https://jira.duraspace.org/browse/DS-3239</a></li>
</ul> </ul>
<h2 id="20160610">2016-06-10</h2>
<h2 id="2016-06-10">2016-06-10</h2>
<ul> <ul>
<li>Investigating authority confidences</li> <li>Investigating authority confidences</li>
<li>It looks like the values are documented in <code>Choices.java</code></li> <li>It looks like the values are documented in <code>Choices.java</code></li>
<li>Experiment with setting all 960 CCAFS author values to be 500:</li>
<li><p>Experiment with setting all 960 CCAFS author values to be 500:</p> </ul>
<pre><code>dspacetest=# SELECT authority, confidence FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; <pre><code>dspacetest=# SELECT authority, confidence FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
dspacetest=# UPDATE metadatavalue set confidence = 500 where resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; dspacetest=# UPDATE metadatavalue set confidence = 500 where resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
UPDATE 960 UPDATE 960
</code></pre></li> </code></pre><ul>
<li>After the database edit, I did a full Discovery re-index</li>
<li><p>After the database edit, I did a full Discovery re-index</p></li> <li>And now there are exactly 960 items in the authors facet for &lsquo;CGIAR Research Program on Climate Change, Agriculture and Food Security&rsquo;</li>
<li>Now I ran the same on CGSpace</li>
<li><p>And now there are exactly 960 items in the authors facet for &lsquo;CGIAR Research Program on Climate Change, Agriculture and Food Security&rsquo;</p></li> <li>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</li>
<li>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
<li><p>Now I ran the same on CGSpace</p></li> <li>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></li>
<li>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</li>
<li><p>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</p></li> <li>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</li>
<li><p>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></p></li>
<li><p>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></p></li>
<li><p>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</p></li>
<li><p>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</p></li>
</ul> </ul>
<h2 id="20160611">2016-06-11</h2>
<h2 id="2016-06-11">2016-06-11</h2>
<ul> <ul>
<li>Merge controlled vocabulary for sponsorship field (<a href="https://github.com/ilri/DSpace/pull/239">#239</a>)</li> <li>Merge controlled vocabulary for sponsorship field (<a href="https://github.com/ilri/DSpace/pull/239">#239</a>)</li>
<li>Fix character encoding issues for animal breed lookup that I merged yesterday</li> <li>Fix character encoding issues for animal breed lookup that I merged yesterday</li>
</ul> </ul>
<h2 id="20160617">2016-06-17</h2>
<h2 id="2016-06-17">2016-06-17</h2>
<ul> <ul>
<li>Linode has free RAM upgrades for their 13th birthday so I migrated DSpace Test (4→8GB of RAM)</li> <li>Linode has free RAM upgrades for their 13th birthday so I migrated DSpace Test (4→8GB of RAM)</li>
</ul> </ul>
<h2 id="20160618">2016-06-18</h2>
<h2 id="2016-06-18">2016-06-18</h2>
<ul> <ul>
<li>Clean up titles and hints in <code>input-forms.xml</code> to use title/sentence case and a few more consistency things (<a href="https://github.com/ilri/DSpace/pull/241">#241</a>)</li> <li>
<p>Clean up titles and hints in <code>input-forms.xml</code> to use title/sentence case and a few more consistency things (<a href="https://github.com/ilri/DSpace/pull/241">#241</a>)</p>
<li><p>The final list of fields to migrate in the third phase of metadata migrations is:</p> </li>
<li>
<p>The final list of fields to migrate in the third phase of metadata migrations is:</p>
<ul> <ul>
<li>dc.title.jtitledc.source</li> <li>dc.title.jtitledc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li> <li>dc.crsubject.crpsubject → cg.contributor.crp</li>
<li>dc.contributor.affiliation → cg.contributor.affiliation</li> <li>dc.contributor.affiliation → cg.contributor.affiliation</li>
<li>dc.srplace.subregion → cg.coverage.subregion</li> <li>dc.srplace.subregion → cg.coverage.subregion</li>
<li>dc.Species → cg.species</li> <li>dc.Species → cg.species</li>
<li>dc.contributor.corporatedc.contributor</li> <li>dc.contributor.corporatedc.contributor</li>
<li>dc.identifier.urlcg.identifier.url</li> <li>dc.identifier.urlcg.identifier.url</li>
<li>dc.identifier.doicg.identifier.doi</li> <li>dc.identifier.doicg.identifier.doi</li>
<li>dc.identifier.googleurlcg.identifier.googleurl</li> <li>dc.identifier.googleurlcg.identifier.googleurl</li>
<li>dc.identifier.dataurlcg.identifier.dataurl</li> <li>dc.identifier.dataurlcg.identifier.dataurl</li>
</ul></li>
<li><p>Interesting &ldquo;Sunburst&rdquo; visualization on a Digital Commons page: <a href="http://www.repository.law.indiana.edu/sunburst.html">http://www.repository.law.indiana.edu/sunburst.html</a></p></li>
<li><p>Final testing on metadata fix/delete for <code>dc.description.sponsorship</code> cleanup</p></li>
<li><p>Need to run <code>fix-metadata-values.py</code> and then <code>fix-metadata-values.py</code></p></li>
</ul> </ul>
</li>
<h2 id="2016-06-20">2016-06-20</h2> <li>
<p>Interesting &ldquo;Sunburst&rdquo; visualization on a Digital Commons page: <a href="http://www.repository.law.indiana.edu/sunburst.html">http://www.repository.law.indiana.edu/sunburst.html</a></p>
</li>
<li>
<p>Final testing on metadata fix/delete for <code>dc.description.sponsorship</code> cleanup</p>
</li>
<li>
<p>Need to run <code>fix-metadata-values.py</code> and then <code>fix-metadata-values.py</code></p>
</li>
</ul>
<h2 id="20160620">2016-06-20</h2>
<ul> <ul>
<li><p>CGSpace&rsquo;s HTTPS certificate expired last night and I didn&rsquo;t notice, had to renew:</p> <li>CGSpace's HTTPS certificate expired last night and I didn't notice, had to renew:</li>
</ul>
<pre><code># /opt/letsencrypt/letsencrypt-auto renew --standalone --pre-hook &quot;/usr/bin/service nginx stop&quot; --post-hook &quot;/usr/bin/service nginx start&quot; <pre><code># /opt/letsencrypt/letsencrypt-auto renew --standalone --pre-hook &quot;/usr/bin/service nginx stop&quot; --post-hook &quot;/usr/bin/service nginx start&quot;
</code></pre></li> </code></pre><ul>
<li>I really need to fix that cron job&hellip;</li>
<li><p>I really need to fix that cron job&hellip;</p></li>
</ul> </ul>
<h2 id="20160624">2016-06-24</h2>
<h2 id="2016-06-24">2016-06-24</h2>
<ul> <ul>
<li><p>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</p> <li>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i investors-not-blank-not-delete-85.csv -f dc.description.sponsorship -t 'correct investor' -m 29 -d cgspace -p 'fuuu' -u cgspace <pre><code>$ ./fix-metadata-values.py -i investors-not-blank-not-delete-85.csv -f dc.description.sponsorship -t 'correct investor' -m 29 -d cgspace -p 'fuuu' -u cgspace
$ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.sponsorship -m 29 -d cgspace -p 'fuuu' -u cgspace $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.sponsorship -m 29 -d cgspace -p 'fuuu' -u cgspace
</code></pre></li> </code></pre><ul>
<li>The scripts for this are here:
<li><p>The scripts for this are here:</p>
<ul> <ul>
<li><a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li> <li><a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
<li><a href="https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be">delete-metadata-values.py</a></li> <li><a href="https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be">delete-metadata-values.py</a></li>
</ul></li>
<li><p>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</p></li>
<li><p>Refine submission form labels and hints</p></li>
</ul> </ul>
</li>
<h2 id="2016-06-28">2016-06-28</h2> <li>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</li>
<li>Refine submission form labels and hints</li>
</ul>
<h2 id="20160628">2016-06-28</h2>
<ul> <ul>
<li>Testing the cleanup of <code>dc.contributor.corporate</code> with 13 deletions and 121 replacements</li> <li>Testing the cleanup of <code>dc.contributor.corporate</code> with 13 deletions and 121 replacements</li>
<li>There are still ~97 fields that weren&rsquo;t indicated to do anything</li> <li>There are still ~97 fields that weren't indicated to do anything</li>
<li>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</li>
<li><p>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</p>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=126 group by text_value order by count desc) to /tmp/contributors-june28.csv with csv;
</code></pre></li>
<li><p>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</p></li>
</ul> </ul>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=126 group by text_value order by count desc) to /tmp/contributors-june28.csv with csv;
<h2 id="2016-06-29">2016-06-29</h2> </code></pre><ul>
<li>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</li>
</ul>
<h2 id="20160629">2016-06-29</h2>
<ul> <ul>
<li><p>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</p> <li>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</li>
</ul>
<pre><code>72 55 #dc.source <pre><code>72 55 #dc.source
86 230 #cg.contributor.crp 86 230 #cg.contributor.crp
91 211 #cg.contributor.affiliation 91 211 #cg.contributor.affiliation
@ -418,40 +356,31 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
74 220 #cg.identifier.doi 74 220 #cg.identifier.doi
79 222 #cg.identifier.googleurl 79 222 #cg.identifier.googleurl
89 223 #cg.identifier.dataurl 89 223 #cg.identifier.dataurl
</code></pre></li> </code></pre><ul>
<li>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</li>
<li><p>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i Corporate-Authors-Fix-121.csv -f dc.contributor.corporate -t 'Correct style' -m 126 -d cgspace -u cgspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i Corporate-Authors-Fix-121.csv -f dc.contributor.corporate -t 'Correct style' -m 126 -d cgspace -u cgspace -p 'fuuu'
$ ./fix-metadata-values.py -i Corporate-Authors-Fix-PB.csv -f dc.contributor.corporate -t 'should be' -m 126 -d cgspace -u cgspace -p 'fuuu' $ ./fix-metadata-values.py -i Corporate-Authors-Fix-PB.csv -f dc.contributor.corporate -t 'should be' -m 126 -d cgspace -u cgspace -p 'fuuu'
$ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-Delete-13.csv -m 126 -u cgspace -d cgspace -p 'fuuu' $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-Delete-13.csv -m 126 -u cgspace -d cgspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>Re-deploy CGSpace and DSpace Test with latest June changes</li>
<li><p>Re-deploy CGSpace and DSpace Test with latest June changes</p></li> <li>Now the sharing and Altmetric bits are more prominent:</li>
<li><p>Now the sharing and Altmetric bits are more prominent:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/06/xmlui-altmetric-sharing.png" alt="DSpace 5.1 XMLUI With Altmetric Badge"></p>
<p><img src="/cgspace-notes/2016/06/xmlui-altmetric-sharing.png" alt="DSpace 5.1 XMLUI With Altmetric Badge" /></p>
<ul> <ul>
<li>Run all system updates on the servers and reboot</li> <li>Run all system updates on the servers and reboot</li>
<li>Start working on config changes for phase three of the metadata migrations</li> <li>Start working on config changes for phase three of the metadata migrations</li>
</ul> </ul>
<h2 id="20160630">2016-06-30</h2>
<h2 id="2016-06-30">2016-06-30</h2>
<ul> <ul>
<li><p>Wow, there are 95 authors in the database who have &lsquo;,&rsquo; at the end of their name:</p> <li>Wow, there are 95 authors in the database who have &lsquo;,&rsquo; at the end of their name:</li>
<pre><code># select text_value from metadatavalue where metadata_field_id=3 and text_value like '%,';
</code></pre></li>
<li><p>We need to use something like this to fix them, need to write a proper regex later:</p>
<pre><code># update metadatavalue set text_value = regexp_replace(text_value, '(Poole, J),', '\1') where metadata_field_id=3 and text_value = 'Poole, J,';
</code></pre></li>
</ul> </ul>
<pre><code># select text_value from metadatavalue where metadata_field_id=3 and text_value like '%,';
</code></pre><ul>
<li>We need to use something like this to fix them, need to write a proper regex later:</li>
</ul>
<pre><code># update metadatavalue set text_value = regexp_replace(text_value, '(Poole, J),', '\1') where metadata_field_id=3 and text_value = 'Poole, J,';
</code></pre>

View File

@ -8,19 +8,16 @@
<meta property="og:title" content="July, 2016" /> <meta property="og:title" content="July, 2016" />
<meta property="og:description" content="2016-07-01 <meta property="og:description" content="2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232) Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names: I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.&#43;?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;; dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.&#43;?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
text_value text_value
------------ ------------
(0 rows) (0 rows)
In this case the select query was showing 95 results before the update In this case the select query was showing 95 results before the update
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -32,22 +29,19 @@ In this case the select query was showing 95 results before the update
<meta name="twitter:title" content="July, 2016"/> <meta name="twitter:title" content="July, 2016"/>
<meta name="twitter:description" content="2016-07-01 <meta name="twitter:description" content="2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232) Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names: I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.&#43;?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;; dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.&#43;?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
text_value text_value
------------ ------------
(0 rows) (0 rows)
In this case the select query was showing 95 results before the update In this case the select query was showing 95 results before the update
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -128,67 +122,49 @@ In this case the select query was showing 95 results before the update
</p> </p>
</header> </header>
<h2 id="2016-07-01">2016-07-01</h2> <h2 id="20160701">2016-07-01</h2>
<ul> <ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
<li><p>In this case the select query was showing 95 results before the update</p></li>
</ul> </ul>
<h2 id="20160702">2016-07-02</h2>
<h2 id="2016-07-02">2016-07-02</h2>
<ul> <ul>
<li>Comment on DSpace Jira ticket about author lookup search text (<a href="https://jira.duraspace.org/browse/DS-2329">DS-2329</a>)</li> <li>Comment on DSpace Jira ticket about author lookup search text (<a href="https://jira.duraspace.org/browse/DS-2329">DS-2329</a>)</li>
</ul> </ul>
<h2 id="20160704">2016-07-04</h2>
<h2 id="2016-07-04">2016-07-04</h2>
<ul> <ul>
<li>Seems the database&rsquo;s author authority values mean nothing without the <code>authority</code> Solr core from the host where they were created!</li> <li>Seems the database's author authority values mean nothing without the <code>authority</code> Solr core from the host where they were created!</li>
</ul> </ul>
<h2 id="20160705">2016-07-05</h2>
<h2 id="2016-07-05">2016-07-05</h2>
<ul> <ul>
<li>Amend <code>backup-solr.sh</code> script so it backs up the entire Solr folder</li> <li>Amend <code>backup-solr.sh</code> script so it backs up the entire Solr folder</li>
<li>We <em>really</em> only need <code>statistics</code> and <code>authority</code> but meh</li> <li>We <em>really</em> only need <code>statistics</code> and <code>authority</code> but meh</li>
<li>Fix metadata for species on DSpace Test:</li>
<li><p>Fix metadata for species on DSpace Test:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
</code></pre></li>
<li><p>Will run later on CGSpace</p></li>
<li><p>A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is &ldquo;ungraded&rdquo;</p></li>
<li><p>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</p></li>
<li><p>I will merge it to <code>5_x-prod</code></p></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
<h2 id="2016-07-06">2016-07-06</h2> </code></pre><ul>
<li>Will run later on CGSpace</li>
<li>A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is &ldquo;ungraded&rdquo;</li>
<li>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</li>
<li>I will merge it to <code>5_x-prod</code></li>
</ul>
<h2 id="20160706">2016-07-06</h2>
<ul> <ul>
<li><p>Delete 23 blank metadata values from CGSpace:</p> <li>Delete 23 blank metadata values from CGSpace:</li>
</ul>
<pre><code>cgspace=# delete from metadatavalue where resource_type_id=2 and text_value=''; <pre><code>cgspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 23 DELETE 23
</code></pre></li> </code></pre><ul>
<li>Complete phase three of metadata migration, for the following fields:
<li><p>Complete phase three of metadata migration, for the following fields:</p>
<ul> <ul>
<li>dc.title.jtitle → dc.source</li> <li>dc.title.jtitle → dc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li> <li>dc.crsubject.crpsubject → cg.contributor.crp</li>
@ -200,165 +176,123 @@ DELETE 23
<li>dc.identifier.doi → cg.identifier.doi</li> <li>dc.identifier.doi → cg.identifier.doi</li>
<li>dc.identifier.googleurl → cg.identifier.googleurl</li> <li>dc.identifier.googleurl → cg.identifier.googleurl</li>
<li>dc.identifier.dataurl → cg.identifier.dataurl</li> <li>dc.identifier.dataurl → cg.identifier.dataurl</li>
</ul></li> </ul>
</li>
<li><p>Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)</p> <li>Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu'
$ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu' $ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu'
$ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu' $ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>I then ran all server updates and rebooted the server</li>
<li><p>I then ran all server updates and rebooted the server</p></li>
</ul> </ul>
<h2 id="20160711">2016-07-11</h2>
<h2 id="2016-07-11">2016-07-11</h2>
<ul> <ul>
<li><p>Doing some author cleanups from Peter and Abenet:</p> <li>Doing some author cleanups from Peter and Abenet:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu <pre><code>$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu $ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
</code></pre></li> </code></pre><h2 id="20160713">2016-07-13</h2>
</ul>
<h2 id="2016-07-13">2016-07-13</h2>
<ul> <ul>
<li>Run the author cleanups on CGSpace and start a full Discovery re-index</li> <li>Run the author cleanups on CGSpace and start a full Discovery re-index</li>
</ul> </ul>
<h2 id="20160714">2016-07-14</h2>
<h2 id="2016-07-14">2016-07-14</h2>
<ul> <ul>
<li>Test LDAP settings for new root LDAP</li> <li>Test LDAP settings for new root LDAP</li>
<li>Seems to work when binding as a top-level user</li> <li>Seems to work when binding as a top-level user</li>
</ul> </ul>
<h2 id="20160718">2016-07-18</h2>
<h2 id="2016-07-18">2016-07-18</h2>
<ul> <ul>
<li>Adjust identifiers in XMLUI item display to be more prominent</li> <li>Adjust identifiers in XMLUI item display to be more prominent</li>
<li>Add species and breed to the XMLUI item display</li> <li>Add species and breed to the XMLUI item display</li>
<li>CGSpace crashed late at night and the DSpace logs were showing:</li>
<li><p>CGSpace crashed late at night and the DSpace logs were showing:</p> </ul>
<pre><code>2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - <pre><code>2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
... ...
</code></pre></li> </code></pre><ul>
<li>I suspect it's someone hitting REST too much:</li>
<li><p>I suspect it&rsquo;s someone hitting REST too much:</p>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
710 66.249.78.38
1781 181.118.144.29
24904 70.32.99.142
</code></pre></li>
<li><p>I just blocked access to <code>/rest</code> for that last IP for now:</p>
<pre><code> # log rest requests
location /rest {
access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443;
deny 70.32.99.142;
}
</code></pre></li>
</ul> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
<h2 id="2016-07-21">2016-07-21</h2> 710 66.249.78.38
1781 181.118.144.29
24904 70.32.99.142
</code></pre><ul>
<li>I just blocked access to <code>/rest</code> for that last IP for now:</li>
</ul>
<pre><code> # log rest requests
location /rest {
access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443;
deny 70.32.99.142;
}
</code></pre><h2 id="20160721">2016-07-21</h2>
<ul> <ul>
<li>Mitigate the <a href="https://httpoxy.org">HTTPoxy</a> vulnerability for Tomcat etc in nginx: <a href="https://github.com/ilri/rmg-ansible-public/pull/38">https://github.com/ilri/rmg-ansible-public/pull/38</a></li> <li>Mitigate the <a href="https://httpoxy.org">HTTPoxy</a> vulnerability for Tomcat etc in nginx: <a href="https://github.com/ilri/rmg-ansible-public/pull/38">https://github.com/ilri/rmg-ansible-public/pull/38</a></li>
<li>Unblock 70.32.99.142 from <code>/rest</code> as it has been blocked for a few days</li> <li>Unblock 70.32.99.142 from <code>/rest</code> as it has been blocked for a few days</li>
</ul> </ul>
<h2 id="20160722">2016-07-22</h2>
<h2 id="2016-07-22">2016-07-22</h2>
<ul> <ul>
<li>Help Paola from CCAFS with thumbnails for batch uploads</li> <li>Help Paola from CCAFS with thumbnails for batch uploads</li>
<li>She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc</li> <li>She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc</li>
<li>Altmetric reports having an issue with some of our authors being doubled&hellip;</li> <li>Altmetric reports having an issue with some of our authors being doubled&hellip;</li>
<li>This is related to authority and confidence!</li> <li>This is related to authority and confidence!</li>
<li>We might need to use <code>index.authority.ignore-prefered=true</code> to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.</li> <li>We might need to use <code>index.authority.ignore-prefered=true</code> to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.</li>
<li>Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:</li>
<li><p>Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:</p> </ul>
<pre><code>index.authority.ignore-prefered.dc.contributor.author=true <pre><code>index.authority.ignore-prefered.dc.contributor.author=true
index.authority.ignore-variants.dc.contributor.author=false index.authority.ignore-variants.dc.contributor.author=false
</code></pre></li> </code></pre><ul>
<li>After reindexing I don't see any change in Discovery's display of authors, and still have entries like:</li>
<li><p>After reindexing I don&rsquo;t see any change in Discovery&rsquo;s display of authors, and still have entries like:</p> </ul>
<pre><code>Grace, D. (464) <pre><code>Grace, D. (464)
Grace, D. (62) Grace, D. (62)
</code></pre></li> </code></pre><ul>
<li>I asked for clarification of the following options on the DSpace mailing list:</li>
<li><p>I asked for clarification of the following options on the DSpace mailing list:</p> </ul>
<pre><code>index.authority.ignore <pre><code>index.authority.ignore
index.authority.ignore-prefered index.authority.ignore-prefered
index.authority.ignore-variants index.authority.ignore-variants
</code></pre></li> </code></pre><ul>
<li>In the mean time, I will try these on DSpace Test (plus a reindex):</li>
<li><p>In the mean time, I will try these on DSpace Test (plus a reindex):</p> </ul>
<pre><code>index.authority.ignore=true <pre><code>index.authority.ignore=true
index.authority.ignore-prefered=true index.authority.ignore-prefered=true
index.authority.ignore-variants=true index.authority.ignore-variants=true
</code></pre></li> </code></pre><ul>
<li>Enabled usage of <code>X-Forwarded-For</code> in DSpace admin control panel (<a href="https://github.com/ilri/DSpace/pull/255">#255</a></li>
<li><p>Enabled usage of <code>X-Forwarded-For</code> in DSpace admin control panel (<a href="https://github.com/ilri/DSpace/pull/255">#255</a></p></li> <li>It was misconfigured and disabled, but already working for some reason <em>sigh</em></li>
<li>&hellip; no luck. Trying with just:</li>
<li><p>It was misconfigured and disabled, but already working for some reason <em>sigh</em></p></li>
<li><p>&hellip; no luck. Trying with just:</p>
<pre><code>index.authority.ignore=true
</code></pre></li>
<li><p>After re-indexing and clearing the XMLUI cache nothing has changed</p></li>
</ul> </ul>
<pre><code>index.authority.ignore=true
<h2 id="2016-07-25">2016-07-25</h2> </code></pre><ul>
<li>After re-indexing and clearing the XMLUI cache nothing has changed</li>
</ul>
<h2 id="20160725">2016-07-25</h2>
<ul> <ul>
<li><p>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</p> <li>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</li>
</ul>
<pre><code>index.authority.ignore-prefered.dc.contributor.author=true <pre><code>index.authority.ignore-prefered.dc.contributor.author=true
index.authority.ignore-variants=true index.authority.ignore-variants=true
</code></pre></li> </code></pre><ul>
<li>Run all OS updates and reboot DSpace Test server</li>
<li><p>Run all OS updates and reboot DSpace Test server</p></li> <li>No changes to Discovery after reindexing&hellip; hmm.</li>
<li>Integrate and massively clean up About page (<a href="https://github.com/ilri/DSpace/pull/256">#256</a>)</li>
<li><p>No changes to Discovery after reindexing&hellip; hmm.</p></li>
<li><p>Integrate and massively clean up About page (<a href="https://github.com/ilri/DSpace/pull/256">#256</a>)</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/07/cgspace-about-page.png" alt="About page"></p>
<p><img src="/cgspace-notes/2016/07/cgspace-about-page.png" alt="About page" /></p>
<ul> <ul>
<li><p>The DSpace source code mentions the configuration key <code>discovery.index.authority.ignore-prefered.*</code> (with prefix of discovery, despite the docs saying otherwise), so I&rsquo;m trying the following on DSpace Test:</p> <li>The DSpace source code mentions the configuration key <code>discovery.index.authority.ignore-prefered.*</code> (with prefix of discovery, despite the docs saying otherwise), so I'm trying the following on DSpace Test:</li>
</ul>
<pre><code>discovery.index.authority.ignore-prefered.dc.contributor.author=true <pre><code>discovery.index.authority.ignore-prefered.dc.contributor.author=true
discovery.index.authority.ignore-variants=true discovery.index.authority.ignore-variants=true
</code></pre></li> </code></pre><ul>
<li>Still no change!</li>
<li><p>Still no change!</p></li> <li>Deploy species, breed, and identifier changes to CGSpace, as well as About page</li>
<li>Run Linode RAM upgrade (8→12GB)</li>
<li><p>Deploy species, breed, and identifier changes to CGSpace, as well as About page</p></li> <li>Re-sync DSpace Test with CGSpace</li>
<li>I noticed that our backup scripts don't send Solr cores to S3 so I amended the script</li>
<li><p>Run Linode RAM upgrade (8→12GB)</p></li>
<li><p>Re-sync DSpace Test with CGSpace</p></li>
<li><p>I noticed that our backup scripts don&rsquo;t send Solr cores to S3 so I amended the script</p></li>
</ul> </ul>
<h2 id="20160731">2016-07-31</h2>
<h2 id="2016-07-31">2016-07-31</h2>
<ul> <ul>
<li>Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by</li> <li>Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by</li>
<li>Also change &ldquo;Subjects&rdquo; to &ldquo;AGROVOC keywords&rdquo; in Discovery sidebar/search and Browse by (<a href="https://github.com/ilri/DSpace/issues/257">#257</a>)</li> <li>Also change &ldquo;Subjects&rdquo; to &ldquo;AGROVOC keywords&rdquo; in Discovery sidebar/search and Browse by (<a href="https://github.com/ilri/DSpace/issues/257">#257</a>)</li>

View File

@ -8,19 +8,16 @@
<meta property="og:title" content="August, 2016" /> <meta property="og:title" content="August, 2016" />
<meta property="og:description" content="2016-08-01 <meta property="og:description" content="2016-08-01
Add updated distribution license from Sisay (#259) Add updated distribution license from Sisay (#259)
Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
bower stuff is a dead end, waste of time, too many issues bower stuff is a dead end, waste of time, too many issues
Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts) Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
Start working on DSpace 5.15.5 port: Start working on DSpace 5.15.5 port:
$ git checkout -b 55new 5_x-prod $ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-08/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-08/" />
@ -31,21 +28,18 @@ $ git rebase -i dspace-5.5
<meta name="twitter:title" content="August, 2016"/> <meta name="twitter:title" content="August, 2016"/>
<meta name="twitter:description" content="2016-08-01 <meta name="twitter:description" content="2016-08-01
Add updated distribution license from Sisay (#259) Add updated distribution license from Sisay (#259)
Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
bower stuff is a dead end, waste of time, too many issues bower stuff is a dead end, waste of time, too many issues
Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts) Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
Start working on DSpace 5.15.5 port: Start working on DSpace 5.15.5 port:
$ git checkout -b 55new 5_x-prod $ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -126,189 +120,140 @@ $ git rebase -i dspace-5.5
</p> </p>
</header> </header>
<h2 id="2016-08-01">2016-08-01</h2> <h2 id="20160801">2016-08-01</h2>
<ul> <ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> <li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> <li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> <li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li> <li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> <li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
<li><p>Start working on DSpace 5.15.5 port:</p> </ul>
<pre><code>$ git checkout -b 55new 5_x-prod <pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre><ul>
</ul> <li>Lots of conflicts that don't make sense (ie, shouldn't conflict!)</li>
<ul>
<li>Lots of conflicts that don&rsquo;t make sense (ie, shouldn&rsquo;t conflict!)</li>
<li>This file in particular conflicts almost 10 times: <code>dspace/modules/xmlui-mirage2/src/main/webapp/themes/CGIAR/styles/_style.scss</code></li> <li>This file in particular conflicts almost 10 times: <code>dspace/modules/xmlui-mirage2/src/main/webapp/themes/CGIAR/styles/_style.scss</code></li>
<li>Checking out a clean branch at 5.5 and cherry-picking our commits works where that file would normally have a conflict</li> <li>Checking out a clean branch at 5.5 and cherry-picking our commits works where that file would normally have a conflict</li>
<li>Seems to be related to merge commits</li> <li>Seems to be related to merge commits</li>
<li><code>git rebase --preserve-merges</code> doesn&rsquo;t seem to help</li> <li><code>git rebase --preserve-merges</code> doesn't seem to help</li>
<li>Eventually I just turned on git rerere and solved the conflicts and completed the 403 commit rebase</li> <li>Eventually I just turned on git rerere and solved the conflicts and completed the 403 commit rebase</li>
<li>The 5.5 code now builds but doesn&rsquo;t run (white page in Tomcat)</li> <li>The 5.5 code now builds but doesn't run (white page in Tomcat)</li>
</ul> </ul>
<h2 id="20160802">2016-08-02</h2>
<h2 id="2016-08-02">2016-08-02</h2>
<ul> <ul>
<li>Ask Atmire for help with DSpace 5.5 issue</li> <li>Ask Atmire for help with DSpace 5.5 issue</li>
<li>Vanilla DSpace 5.5 deploys and runs fine</li> <li>Vanilla DSpace 5.5 deploys and runs fine</li>
<li>Playing with DSpace in Ubuntu 16.04 and Tomcat 7</li> <li>Playing with DSpace in Ubuntu 16.04 and Tomcat 7</li>
<li>Everything is still fucked up, even vanilla DSpace 5.5</li> <li>Everything is still fucked up, even vanilla DSpace 5.5</li>
</ul> </ul>
<h2 id="20160804">2016-08-04</h2>
<h2 id="2016-08-04">2016-08-04</h2>
<ul> <ul>
<li>Ask on DSpace mailing list about duplicate authors, Discovery and author text values</li> <li>Ask on DSpace mailing list about duplicate authors, Discovery and author text values</li>
<li>Atmire responded with some new DSpace 5.5 ready versions to try for their modules</li> <li>Atmire responded with some new DSpace 5.5 ready versions to try for their modules</li>
</ul> </ul>
<h2 id="20160805">2016-08-05</h2>
<h2 id="2016-08-05">2016-08-05</h2>
<ul> <ul>
<li>Fix item display incorrectly displaying Species when Breeds were present (<a href="https://github.com/ilri/DSpace/pull/260">#260</a>)</li> <li>Fix item display incorrectly displaying Species when Breeds were present (<a href="https://github.com/ilri/DSpace/pull/260">#260</a>)</li>
<li>Experiment with fixing more authors, like Delia Grace:</li>
<li><p>Experiment with fixing more authors, like Delia Grace:</p>
<pre><code>dspacetest=# update metadatavalue set authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where metadata_field_id=3 and text_value='Grace, D.';
</code></pre></li>
</ul> </ul>
<pre><code>dspacetest=# update metadatavalue set authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where metadata_field_id=3 and text_value='Grace, D.';
<h2 id="2016-08-06">2016-08-06</h2> </code></pre><h2 id="20160806">2016-08-06</h2>
<ul> <ul>
<li>Finally figured out how to remove &ldquo;View/Open&rdquo; and &ldquo;Bitstreams&rdquo; from the item view</li> <li>Finally figured out how to remove &ldquo;View/Open&rdquo; and &ldquo;Bitstreams&rdquo; from the item view</li>
</ul> </ul>
<h2 id="20160807">2016-08-07</h2>
<h2 id="2016-08-07">2016-08-07</h2>
<ul> <ul>
<li>Start working on Ubuntu 16.04 Ansible playbook for Tomcat 8, PostgreSQL 9.5, Oracle 8, etc</li> <li>Start working on Ubuntu 16.04 Ansible playbook for Tomcat 8, PostgreSQL 9.5, Oracle 8, etc</li>
</ul> </ul>
<h2 id="20160808">2016-08-08</h2>
<h2 id="2016-08-08">2016-08-08</h2>
<ul> <ul>
<li>Still troubleshooting Atmire modules on DSpace 5.5</li> <li>Still troubleshooting Atmire modules on DSpace 5.5</li>
<li>Vanilla DSpace 5.5 works on Tomcat 7&hellip;</li> <li>Vanilla DSpace 5.5 works on Tomcat 7&hellip;</li>
<li>Ooh, and vanilla DSpace 5.5 works on Tomcat 8 with Java 8!</li> <li>Ooh, and vanilla DSpace 5.5 works on Tomcat 8 with Java 8!</li>
<li>Some notes about setting up Tomcat 8, since it&rsquo;s new on this machine&hellip;</li> <li>Some notes about setting up Tomcat 8, since it's new on this machine&hellip;</li>
<li>Install latest Oracle Java 8 JDK</li> <li>Install latest Oracle Java 8 JDK</li>
<li>Create <code>setenv.sh</code> in Tomcat 8 <code>libexec/bin</code> directory:</li>
<li><p>Create <code>setenv.sh</code> in Tomcat 8 <code>libexec/bin</code> directory:</p> </ul>
<pre><code>CATALINA_OPTS=&quot;-Djava.awt.headless=true -Xms3072m -Xmx3072m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dfile.encoding=UTF-8&quot; <pre><code>CATALINA_OPTS=&quot;-Djava.awt.headless=true -Xms3072m -Xmx3072m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dfile.encoding=UTF-8&quot;
CATALINA_OPTS=&quot;$CATALINA_OPTS -Djava.library.path=/opt/brew/Cellar/tomcat-native/1.2.8/lib&quot; CATALINA_OPTS=&quot;$CATALINA_OPTS -Djava.library.path=/opt/brew/Cellar/tomcat-native/1.2.8/lib&quot;
JRE_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home JRE_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home
</code></pre></li> </code></pre><ul>
<li>Edit Tomcat 8 <code>server.xml</code> to add regular HTTP listener for solr</li>
<li><p>Edit Tomcat 8 <code>server.xml</code> to add regular HTTP listener for solr</p></li> <li>Symlink webapps:</li>
</ul>
<li><p>Symlink webapps:</p>
<pre><code>$ rm -rf /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT <pre><code>$ rm -rf /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT
$ ln -sv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT $ ln -sv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT
$ ln -sv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/oai $ ln -sv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/oai
$ ln -sv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/jspui $ ln -sv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/jspui
$ ln -sv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/rest $ ln -sv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/rest
$ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/solr $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/solr
</code></pre></li> </code></pre><h2 id="20160809">2016-08-09</h2>
</ul>
<h2 id="2016-08-09">2016-08-09</h2>
<ul> <ul>
<li>More tests of Atmire&rsquo;s 5.5 modules on a clean, working instance of <code>5_x-prod</code></li> <li>More tests of Atmire's 5.5 modules on a clean, working instance of <code>5_x-prod</code></li>
<li>Still fails, though perhaps differently than before (Flyway): <a href="https://gist.github.com/alanorth/5d49c45a16efd7c6bc1e6642e66118b2">https://gist.github.com/alanorth/5d49c45a16efd7c6bc1e6642e66118b2</a></li> <li>Still fails, though perhaps differently than before (Flyway): <a href="https://gist.github.com/alanorth/5d49c45a16efd7c6bc1e6642e66118b2">https://gist.github.com/alanorth/5d49c45a16efd7c6bc1e6642e66118b2</a></li>
<li>More work on Tomcat 8 and Java 8 stuff for Ansible playbooks</li> <li>More work on Tomcat 8 and Java 8 stuff for Ansible playbooks</li>
</ul> </ul>
<h2 id="20160810">2016-08-10</h2>
<h2 id="2016-08-10">2016-08-10</h2>
<ul> <ul>
<li>Turns out DSpace 5.x isn&rsquo;t ready for Tomcat 8: <a href="https://jira.duraspace.org/browse/DS-3092">https://jira.duraspace.org/browse/DS-3092</a></li> <li>Turns out DSpace 5.x isn't ready for Tomcat 8: <a href="https://jira.duraspace.org/browse/DS-3092">https://jira.duraspace.org/browse/DS-3092</a></li>
<li>So we&rsquo;ll need to use Tomcat 7 + Java 8 on Ubuntu 16.04</li> <li>So we'll need to use Tomcat 7 + Java 8 on Ubuntu 16.04</li>
<li>More work on the Ansible stuff for this, allowing Tomcat 7 to use Java 8</li> <li>More work on the Ansible stuff for this, allowing Tomcat 7 to use Java 8</li>
<li>Merge pull request for fixing the type Discovery index to use <code>dc.type</code> (<a href="https://github.com/ilri/DSpace/pull/262">#262</a>)</li> <li>Merge pull request for fixing the type Discovery index to use <code>dc.type</code> (<a href="https://github.com/ilri/DSpace/pull/262">#262</a>)</li>
<li>Merge pull request for removing &ldquo;Bitstream&rdquo; text from item display, as it confuses users and isn&rsquo;t necessary (<a href="https://github.com/ilri/DSpace/pull/263">#263</a>)</li> <li>Merge pull request for removing &ldquo;Bitstream&rdquo; text from item display, as it confuses users and isn't necessary (<a href="https://github.com/ilri/DSpace/pull/263">#263</a>)</li>
</ul> </ul>
<h2 id="20160811">2016-08-11</h2>
<h2 id="2016-08-11">2016-08-11</h2>
<ul> <ul>
<li>Finally got DSpace (5.5) running on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5 via the updated Ansible stuff</li> <li>Finally got DSpace (5.5) running on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5 via the updated Ansible stuff</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/08/dspace55-ubuntu16.04.png" alt="DSpace 5.5 on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5"></p>
<p><img src="/cgspace-notes/2016/08/dspace55-ubuntu16.04.png" alt="DSpace 5.5 on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5" /></p> <h2 id="20160814">2016-08-14</h2>
<h2 id="2016-08-14">2016-08-14</h2>
<ul> <ul>
<li>Update Mirage 2 build notes for Ubuntu 16.04: <a href="https://gist.github.com/alanorth/2cf9c15834dc68a514262fcb04004cb0">https://gist.github.com/alanorth/2cf9c15834dc68a514262fcb04004cb0</a></li> <li>Update Mirage 2 build notes for Ubuntu 16.04: <a href="https://gist.github.com/alanorth/2cf9c15834dc68a514262fcb04004cb0">https://gist.github.com/alanorth/2cf9c15834dc68a514262fcb04004cb0</a></li>
</ul> </ul>
<h2 id="20160815">2016-08-15</h2>
<h2 id="2016-08-15">2016-08-15</h2>
<ul> <ul>
<li>Notes on NodeJS + nginx + systemd: <a href="https://gist.github.com/alanorth/51acd476891c67dfe27725848cf5ace1">https://gist.github.com/alanorth/51acd476891c67dfe27725848cf5ace1</a></li> <li>Notes on NodeJS + nginx + systemd: <a href="https://gist.github.com/alanorth/51acd476891c67dfe27725848cf5ace1">https://gist.github.com/alanorth/51acd476891c67dfe27725848cf5ace1</a></li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/08/nodejs-nginx.png" alt="ExpressJS running behind nginx"></p>
<p><img src="/cgspace-notes/2016/08/nodejs-nginx.png" alt="ExpressJS running behind nginx" /></p> <h2 id="20160816">2016-08-16</h2>
<h2 id="2016-08-16">2016-08-16</h2>
<ul> <ul>
<li>Troubleshoot Paramiko connection issues with Ansible on ILRI servers: <a href="https://github.com/ilri/rmg-ansible-public/issues/37">#37</a></li> <li>Troubleshoot Paramiko connection issues with Ansible on ILRI servers: <a href="https://github.com/ilri/rmg-ansible-public/issues/37">#37</a></li>
<li>Turns out we need to add some MACs to our <code>sshd_config</code>: hmac-sha2-512,hmac-sha2-256</li> <li>Turns out we need to add some MACs to our <code>sshd_config</code>: hmac-sha2-512,hmac-sha2-256</li>
<li>Update DSpace Test&rsquo;s Java to version 8 to start testing this configuration (<a href="https://wiki.apache.org/solr/ShawnHeisey">seeing as Solr recommends it</a>)</li> <li>Update DSpace Test's Java to version 8 to start testing this configuration (<a href="https://wiki.apache.org/solr/ShawnHeisey">seeing as Solr recommends it</a>)</li>
</ul> </ul>
<h2 id="20160817">2016-08-17</h2>
<h2 id="2016-08-17">2016-08-17</h2>
<ul> <ul>
<li>More work on Let&rsquo;s Encrypt stuff for Ansible roles</li> <li>More work on Let's Encrypt stuff for Ansible roles</li>
<li>Yesterday Atmire responded about DSpace 5.5 issues and asked me to try the <code>dspace database repair</code> command to fix Flyway issues</li> <li>Yesterday Atmire responded about DSpace 5.5 issues and asked me to try the <code>dspace database repair</code> command to fix Flyway issues</li>
<li>The <code>dspace database</code> command doesn&rsquo;t even run: <a href="https://gist.github.com/alanorth/c43c8d89e8df346d32c0ee938be90cd5">https://gist.github.com/alanorth/c43c8d89e8df346d32c0ee938be90cd5</a></li> <li>The <code>dspace database</code> command doesn't even run: <a href="https://gist.github.com/alanorth/c43c8d89e8df346d32c0ee938be90cd5">https://gist.github.com/alanorth/c43c8d89e8df346d32c0ee938be90cd5</a></li>
<li>Oops, it looks like the missing classes causing <code>dspace database</code> to fail were coming from the old <code>~/dspace/config/spring</code> folder</li> <li>Oops, it looks like the missing classes causing <code>dspace database</code> to fail were coming from the old <code>~/dspace/config/spring</code> folder</li>
<li>After removing the spring folder and running ant install again, <code>dspace database</code> works</li> <li>After removing the spring folder and running ant install again, <code>dspace database</code> works</li>
<li>I see there are missing and pending Flyway migrations, but running <code>dspace database repair</code> and <code>dspace database migrate</code> does nothing: <a href="https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70">https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70</a></li> <li>I see there are missing and pending Flyway migrations, but running <code>dspace database repair</code> and <code>dspace database migrate</code> does nothing: <a href="https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70">https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70</a></li>
</ul> </ul>
<h2 id="20160818">2016-08-18</h2>
<h2 id="2016-08-18">2016-08-18</h2>
<ul> <ul>
<li>Fix &ldquo;CONGO,DR&rdquo; country name in <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/264">#264</a>)</li> <li>Fix &ldquo;CONGO,DR&rdquo; country name in <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/264">#264</a>)</li>
<li>Also need to fix existing records using the incorrect form in the database:</li>
<li><p>Also need to fix existing records using the incorrect form in the database:</p>
<pre><code>dspace=# update metadatavalue set text_value='CONGO, DR' where resource_type_id=2 and metadata_field_id=228 and text_value='CONGO,DR';
</code></pre></li>
<li><p>I asked a question on the DSpace mailing list about updating &ldquo;preferred&rdquo; forms of author names from ORCID</p></li>
</ul> </ul>
<pre><code>dspace=# update metadatavalue set text_value='CONGO, DR' where resource_type_id=2 and metadata_field_id=228 and text_value='CONGO,DR';
<h2 id="2016-08-21">2016-08-21</h2> </code></pre><ul>
<li>I asked a question on the DSpace mailing list about updating &ldquo;preferred&rdquo; forms of author names from ORCID</li>
</ul>
<h2 id="20160821">2016-08-21</h2>
<ul> <ul>
<li>A few days ago someone on the DSpace mailing list suggested I try <code>dspace dsrun org.dspace.authority.UpdateAuthorities</code> to update preferred author names from ORCID</li> <li>A few days ago someone on the DSpace mailing list suggested I try <code>dspace dsrun org.dspace.authority.UpdateAuthorities</code> to update preferred author names from ORCID</li>
<li>If you set <code>auto-update-items=true</code> in <code>dspace/config/modules/solrauthority.cfg</code> it is supposed to update records it finds automatically</li> <li>If you set <code>auto-update-items=true</code> in <code>dspace/config/modules/solrauthority.cfg</code> it is supposed to update records it finds automatically</li>
<li>I updated my name format on ORCID and I&rsquo;ve been running that script a few times per day since then but nothing has changed</li> <li>I updated my name format on ORCID and I've been running that script a few times per day since then but nothing has changed</li>
<li>Still troubleshooting Atmire modules on DSpace 5.5</li> <li>Still troubleshooting Atmire modules on DSpace 5.5</li>
<li>I sent them some new verbose logs: <a href="https://gist.github.com/alanorth/700748995649688148ceba89d760253e">https://gist.github.com/alanorth/700748995649688148ceba89d760253e</a></li> <li>I sent them some new verbose logs: <a href="https://gist.github.com/alanorth/700748995649688148ceba89d760253e">https://gist.github.com/alanorth/700748995649688148ceba89d760253e</a></li>
</ul> </ul>
<h2 id="20160822">2016-08-22</h2>
<h2 id="2016-08-22">2016-08-22</h2>
<ul> <ul>
<li><p>Database migrations are fine on DSpace 5.1:</p> <li>Database migrations are fine on DSpace 5.1:</li>
</ul>
<pre><code>$ ~/dspace/bin/dspace database info <pre><code>$ ~/dspace/bin/dspace database info
Database URL: jdbc:postgresql://localhost:5432/dspacetest Database URL: jdbc:postgresql://localhost:5432/dspacetest
@ -338,106 +283,80 @@ Database Driver: PostgreSQL Native Driver version PostgreSQL 9.1 JDBC4 (build 90
| 5.1.2015.12.03 | Atmire CUA 4 migration | 2016-03-21 17:10:41 | Success | | 5.1.2015.12.03 | Atmire CUA 4 migration | 2016-03-21 17:10:41 | Success |
| 5.1.2015.12.03 | Atmire MQM migration | 2016-03-21 17:10:42 | Success | | 5.1.2015.12.03 | Atmire MQM migration | 2016-03-21 17:10:42 | Success |
+----------------+----------------------------+---------------------+---------+ +----------------+----------------------------+---------------------+---------+
</code></pre></li> </code></pre><ul>
<li>So I'm not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as &ldquo;Missing&rdquo;)</li>
<li><p>So I&rsquo;m not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as &ldquo;Missing&rdquo;)</p></li>
</ul> </ul>
<h2 id="20160823">2016-08-23</h2>
<h2 id="2016-08-23">2016-08-23</h2>
<ul> <ul>
<li>Help Paola from CCAFS with her thumbnails again</li> <li>Help Paola from CCAFS with her thumbnails again</li>
<li>Talk to Atmire about the DSpace 5.5 issue, and it seems to be caused by a bug in FlywayDB</li> <li>Talk to Atmire about the DSpace 5.5 issue, and it seems to be caused by a bug in FlywayDB</li>
<li>They said I should delete the Atmire migrations</li>
<li><p>They said I should delete the Atmire migrations</p> </ul>
<pre><code>dspacetest=# delete from schema_version where description = 'Atmire CUA 4 migration' and version='5.1.2015.12.03.2'; <pre><code>dspacetest=# delete from schema_version where description = 'Atmire CUA 4 migration' and version='5.1.2015.12.03.2';
dspacetest=# delete from schema_version where description = 'Atmire MQM migration' and version='5.1.2015.12.03.3'; dspacetest=# delete from schema_version where description = 'Atmire MQM migration' and version='5.1.2015.12.03.3';
</code></pre></li> </code></pre><ul>
<li>After that DSpace starts up by XMLUI now has unrelated issues that I need to solve!</li>
<li><p>After that DSpace starts up by XMLUI now has unrelated issues that I need to solve!</p> </ul>
<pre><code>org.apache.avalon.framework.configuration.ConfigurationException: Type 'ThemeResourceReader' does not exist for 'map:read' at jndi:/localhost/themes/0_CGIAR/sitemap.xmap:136:77 <pre><code>org.apache.avalon.framework.configuration.ConfigurationException: Type 'ThemeResourceReader' does not exist for 'map:read' at jndi:/localhost/themes/0_CGIAR/sitemap.xmap:136:77
context:/jndi:/localhost/themes/0_CGIAR/sitemap.xmap - 136:77 context:/jndi:/localhost/themes/0_CGIAR/sitemap.xmap - 136:77
</code></pre></li> </code></pre><ul>
<li>Looks like we're missing some stuff in the XMLUI module's <code>sitemap.xmap</code>, as well as in each of our XMLUI themes</li>
<li><p>Looks like we&rsquo;re missing some stuff in the XMLUI module&rsquo;s <code>sitemap.xmap</code>, as well as in each of our XMLUI themes</p></li> <li>Diff them with these to get the <code>ThemeResourceReader</code> changes:
<li><p>Diff them with these to get the <code>ThemeResourceReader</code> changes:</p>
<ul> <ul>
<li><code>dspace-xmlui/src/main/webapp/sitemap.xmap</code></li> <li><code>dspace-xmlui/src/main/webapp/sitemap.xmap</code></li>
<li><code>dspace-xmlui-mirage2/src/main/webapp/sitemap.xmap</code></li> <li><code>dspace-xmlui-mirage2/src/main/webapp/sitemap.xmap</code></li>
</ul></li>
<li><p>Then we had some NullPointerException from the SolrLogger class, which is apparently part of Atmire&rsquo;s CUA module</p></li>
<li><p>I tried with a small version bump to CUA but it didn&rsquo;t work (version <code>5.5-4.1.1-0</code>)</p></li>
<li><p>Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode&rsquo;s kernels don&rsquo;t enable them</p></li>
</ul> </ul>
</li>
<h2 id="2016-08-24">2016-08-24</h2> <li>Then we had some NullPointerException from the SolrLogger class, which is apparently part of Atmire's CUA module</li>
<li>I tried with a small version bump to CUA but it didn't work (version <code>5.5-4.1.1-0</code>)</li>
<li>Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode's kernels don't enable them</li>
</ul>
<h2 id="20160824">2016-08-24</h2>
<ul> <ul>
<li>Clean up and import 48 CCAFS records into DSpace Test</li> <li>Clean up and import 48 CCAFS records into DSpace Test</li>
<li>SQL to get all journal titles from dc.source (55), since it's apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:</li>
<li><p>SQL to get all journal titles from dc.source (55), since it&rsquo;s apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:</p>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id=55 and text_value !~ '.*(\.pdf|\.png|\.PDF|\.Pdf|\.JPEG|\.jpg|\.JPG|\.jpeg|\.xls|\.rtf|\.docx?|\.potx|\.dotx|\.eqa|\.tiff|\.mp4|\.mp3|\.gif|\.zip|\.txt|\.pptx|\.indd|\.PNG|\.bmp|\.exe|org\.dspace\.app\.mediafilter).*';
</code></pre></li>
</ul> </ul>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id=55 and text_value !~ '.*(\.pdf|\.png|\.PDF|\.Pdf|\.JPEG|\.jpg|\.JPG|\.jpeg|\.xls|\.rtf|\.docx?|\.potx|\.dotx|\.eqa|\.tiff|\.mp4|\.mp3|\.gif|\.zip|\.txt|\.pptx|\.indd|\.PNG|\.bmp|\.exe|org\.dspace\.app\.mediafilter).*';
<h2 id="2016-08-25">2016-08-25</h2> </code></pre><h2 id="20160825">2016-08-25</h2>
<ul> <ul>
<li><p>Atmire suggested adding a missing bean to <code>dspace/config/spring/api/atmire-cua.xml</code> but it doesn&rsquo;t help:</p> <li>Atmire suggested adding a missing bean to <code>dspace/config/spring/api/atmire-cua.xml</code> but it doesn't help:</li>
</ul>
<pre><code>... <pre><code>...
Error creating bean with name 'MetadataStorageInfoService' Error creating bean with name 'MetadataStorageInfoService'
... ...
</code></pre></li> </code></pre><ul>
<li>Atmire sent an updated version of <code>dspace/config/spring/api/atmire-cua.xml</code> and now XMLUI starts but gives a null pointer exception:</li>
<li><p>Atmire sent an updated version of <code>dspace/config/spring/api/atmire-cua.xml</code> and now XMLUI starts but gives a null pointer exception:</p> </ul>
<pre><code>Java stacktrace: java.lang.NullPointerException <pre><code>Java stacktrace: java.lang.NullPointerException
at org.dspace.app.xmlui.aspect.statistics.Navigation.addOptions(Navigation.java:129) at org.dspace.app.xmlui.aspect.statistics.Navigation.addOptions(Navigation.java:129)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:228) at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:228)
at sun.reflect.GeneratedMethodAccessor126.invoke(Unknown Source) at sun.reflect.GeneratedMethodAccessor126.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71) at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy103.startElement(Unknown Source) at com.sun.proxy.$Proxy103.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140) at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140) at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94) at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
... ...
</code></pre></li> </code></pre><ul>
<li>Import the 47 CCAFS records to CGSpace, creating the SimpleArchiveFormat bundles and importing like:</li>
<li><p>Import the 47 CCAFS records to CGSpace, creating the SimpleArchiveFormat bundles and importing like:</p> </ul>
<pre><code>$ ./safbuilder.sh -c /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/3546.csv <pre><code>$ ./safbuilder.sh -c /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/3546.csv
$ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/3546 -s /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/SimpleArchiveFormat -m 3546.map $ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/3546 -s /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/SimpleArchiveFormat -m 3546.map
</code></pre></li> </code></pre><ul>
<li>Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs</li>
<li><p>Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs</p></li>
</ul> </ul>
<h2 id="20160826">2016-08-26</h2>
<h2 id="2016-08-26">2016-08-26</h2>
<ul> <ul>
<li>CGSpace had issues tonight, not entirely crashing, but becoming unresponsive</li> <li>CGSpace had issues tonight, not entirely crashing, but becoming unresponsive</li>
<li>The dspace log had this:</li>
<li><p>The dspace log had this:</p>
<pre><code>2016-08-26 20:48:05,040 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre></li>
<li><p>Related to /rest no doubt</p></li>
</ul> </ul>
<pre><code>2016-08-26 20:48:05,040 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
<h2 id="2016-08-27">2016-08-27</h2> </code></pre><ul>
<li>Related to /rest no doubt</li>
</ul>
<h2 id="20160827">2016-08-27</h2>
<ul> <ul>
<li>Run corrections for Delia Grace and <code>CONGO, DR</code>, and deploy August changes to CGSpace</li> <li>Run corrections for Delia Grace and <code>CONGO, DR</code>, and deploy August changes to CGSpace</li>
<li>Run all system updates and reboot the server</li> <li>Run all system updates and reboot the server</li>

View File

@ -8,15 +8,12 @@
<meta property="og:title" content="September, 2016" /> <meta property="og:title" content="September, 2016" />
<meta property="og:description" content="2016-09-01 <meta property="og:description" content="2016-09-01
Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace Discuss how the migration of CGIAR&#39;s Active Directory to a flat structure will break our LDAP groups in DSpace
We had been using DC=ILRI to determine whether a user was ILRI or not We had been using DC=ILRI to determine whether a user was ILRI or not
It looks like we might be able to use OUs now, instead of DCs: It looks like we might be able to use OUs now, instead of DCs:
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-09/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-09/" />
@ -27,17 +24,14 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=or
<meta name="twitter:title" content="September, 2016"/> <meta name="twitter:title" content="September, 2016"/>
<meta name="twitter:description" content="2016-09-01 <meta name="twitter:description" content="2016-09-01
Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace Discuss how the migration of CGIAR&#39;s Active Directory to a flat structure will break our LDAP groups in DSpace
We had been using DC=ILRI to determine whether a user was ILRI or not We had been using DC=ILRI to determine whether a user was ILRI or not
It looks like we might be able to use OUs now, instead of DCs: It looks like we might be able to use OUs now, instead of DCs:
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,34 +112,26 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=or
</p> </p>
</header> </header>
<h2 id="2016-09-01">2016-09-01</h2> <h2 id="20160901">2016-09-01</h2>
<ul> <ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p> </ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; </code></pre><ul>
</code></pre></li> <li>User who has been migrated to the root vs user still in the hierarchical structure:</li>
</ul> </ul>
<ul>
<li><p>User who has been migrated to the root vs user still in the hierarchical structure:</p>
<pre><code>distinguishedName: CN=Last\, First (ILRI),OU=ILRI Kenya Employees,OU=ILRI Kenya,OU=ILRIHUB,DC=CGIARAD,DC=ORG <pre><code>distinguishedName: CN=Last\, First (ILRI),OU=ILRI Kenya Employees,OU=ILRI Kenya,OU=ILRIHUB,DC=CGIARAD,DC=ORG
distinguishedName: CN=Last\, First (ILRI),OU=ILRI Ethiopia Employees,OU=ILRI Ethiopia,DC=ILRI,DC=CGIARAD,DC=ORG distinguishedName: CN=Last\, First (ILRI),OU=ILRI Ethiopia Employees,OU=ILRI Ethiopia,DC=ILRI,DC=CGIARAD,DC=ORG
</code></pre></li> </code></pre><ul>
<li>Changing the DSpace LDAP config to use <code>OU=ILRIHUB</code> seems to work:</li>
<li><p>Changing the DSpace LDAP config to use <code>OU=ILRIHUB</code> seems to work:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/09/ilri-ldap-users.png" alt="DSpace groups based on LDAP DN"></p>
<p><img src="/cgspace-notes/2016/09/ilri-ldap-users.png" alt="DSpace groups based on LDAP DN" /></p>
<ul> <ul>
<li><p>Notes for local PostgreSQL database recreation from production snapshot:</p> <li>Notes for local PostgreSQL database recreation from production snapshot:</li>
</ul>
<pre><code>$ dropdb dspacetest <pre><code>$ dropdb dspacetest
$ createdb -O dspacetest --encoding=UNICODE dspacetest $ createdb -O dspacetest --encoding=UNICODE dspacetest
$ psql dspacetest -c 'alter user dspacetest createuser;' $ psql dspacetest -c 'alter user dspacetest createuser;'
@ -153,84 +139,74 @@ $ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-09-01.backu
$ psql dspacetest -c 'alter user dspacetest nocreateuser;' $ psql dspacetest -c 'alter user dspacetest nocreateuser;'
$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost $ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
$ vacuumdb dspacetest $ vacuumdb dspacetest
</code></pre></li> </code></pre><ul>
<li>Some names that I thought I fixed in July seem not to be:</li>
<li><p>Some names that I thought I fixed in July seem not to be:</p> </ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %'; <pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
text_value | authority | confidence text_value | authority | confidence
-----------------------+--------------------------------------+------------ -----------------------+--------------------------------------+------------
Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb | 600 Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb | 600
Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 | 600 Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 | 600
Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 | 600 Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 | 600
Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600 Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
Poole, E.J. | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600 Poole, E.J. | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
Poole, E.J. | 0fbd91b9-1b71-4504-8828-e26885bf8b84 | 600 Poole, E.J. | 0fbd91b9-1b71-4504-8828-e26885bf8b84 | 600
(6 rows) (6 rows)
</code></pre></li> </code></pre><ul>
<li>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</li>
<li><p>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</p> </ul>
<pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %'; <pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
UPDATE 69 UPDATE 69
</code></pre></li> </code></pre><ul>
<li>And for Peter Ballantyne:</li>
<li><p>And for Peter Ballantyne:</p> </ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %'; <pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
text_value | authority | confidence text_value | authority | confidence
-------------------+--------------------------------------+------------ -------------------+--------------------------------------+------------
Ballantyne, Peter | 2dcbcc7b-47b0-4fd7-bef9-39d554494081 | 600 Ballantyne, Peter | 2dcbcc7b-47b0-4fd7-bef9-39d554494081 | 600
Ballantyne, Peter | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600 Ballantyne, Peter | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600
Ballantyne, P.G. | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600 Ballantyne, P.G. | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600
Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca | 600 Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca | 600
Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 | 600 Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 | 600
(5 rows) (5 rows)
</code></pre></li> </code></pre><ul>
<li>Again, a few have the correct ORCID, but there should only be one authority&hellip;</li>
<li><p>Again, a few have the correct ORCID, but there should only be one authority&hellip;</p> </ul>
<pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %'; <pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
UPDATE 58 UPDATE 58
</code></pre></li> </code></pre><ul>
<li>And for me:</li>
<li><p>And for me:</p> </ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%'; <pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';
text_value | authority | confidence text_value | authority | confidence
------------+--------------------------------------+------------ ------------+--------------------------------------+------------
Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600 Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
Orth, A. | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600 Orth, A. | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600 Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
(3 rows) (3 rows)
dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %'; dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';
UPDATE 11 UPDATE 11
</code></pre></li> </code></pre><ul>
<li>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</li>
<li><p>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%'; <pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
UPDATE 166 UPDATE 166
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%'; dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
text_value | authority | confidence text_value | authority | confidence
------------------------+--------------------------------------+------------ ------------------------+--------------------------------------+------------
Campbell, Bruce | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600 Campbell, Bruce | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
Campbell, Bruce Morgan | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600 Campbell, Bruce Morgan | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
Campbell, B. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600 Campbell, B. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
Campbell, B.M. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600 Campbell, B.M. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
(4 rows) (4 rows)
</code></pre></li> </code></pre><ul>
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
<li><p>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</p></li> <li>Run authority updates on CGSpace</li>
<li><p>Run authority updates on CGSpace</p></li>
</ul> </ul>
<h2 id="20160905">2016-09-05</h2>
<h2 id="2016-09-05">2016-09-05</h2>
<ul> <ul>
<li><p>After one week of logging TLS connections on CGSpace:</p> <li>After one week of logging TLS connections on CGSpace:</li>
</ul>
<pre><code># zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l <pre><code># zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
217 217
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l # zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
@ -238,195 +214,152 @@ Campbell, B.M. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
# zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq # zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
TLSv1/DES-CBC3-SHA TLSv1/DES-CBC3-SHA
TLSv1/EDH-RSA-DES-CBC3-SHA TLSv1/EDH-RSA-DES-CBC3-SHA
</code></pre></li> </code></pre><ul>
<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>
<li><p>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</p></li> <li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>
<li><p>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</p>
<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
</code></pre></li>
<li><p>This gives you, for example: <code>Mainstreaming gender in agricultural R&amp;D.pdf__description:Brief</code></p></li>
</ul> </ul>
<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
<h2 id="2016-09-06">2016-09-06</h2> </code></pre><ul>
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&amp;D.pdf__description:Brief</code></li>
</ul>
<h2 id="20160906">2016-09-06</h2>
<ul> <ul>
<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li> <li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>
<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF: <li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
<ul> <ul>
<li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li> <li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li>
<li>Imports fine on DSpace running on Mac OS X</li> <li>Imports fine on DSpace running on Mac OS X</li>
<li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li> <li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li>
</ul></li> </ul>
</li>
<li>Change diacritic in file name from á to a and re-create SAF bundle and zip <li>Change diacritic in file name from á to a and re-create SAF bundle and zip
<ul> <ul>
<li>Success on both Mac OS X and Linux&hellip;</li> <li>Success on both Mac OS X and Linux&hellip;</li>
</ul></li> </ul>
</li>
<li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)</li> <li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)</li>
<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li> <li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>
<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0</a></li> <li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0</a></li>
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li> <li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
<li>We should definitely clean filenames so they don't use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>&quot;</code></li>
<li><p>We should definitely clean filenames so they don&rsquo;t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>&quot;</code></p> </ul>
<pre><code>value.replace(&quot;'&quot;,&quot;&quot;).replace(&quot;,&quot;,&quot;&quot;).replace('&quot;','') <pre><code>value.replace(&quot;'&quot;,&quot;&quot;).replace(&quot;,&quot;,&quot;&quot;).replace('&quot;','')
</code></pre></li> </code></pre><ul>
<li>I need to write a Python script to match that for renaming files in the file system</li>
<li><p>I need to write a Python script to match that for renaming files in the file system</p></li> <li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
<li><p>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</p></li> <li>In the end I was able to import the files after unzipping them ONLY on Linux
<li><p>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</p></li>
<li><p>In the end I was able to import the files after unzipping them ONLY on Linux</p>
<ul> <ul>
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li> <li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
</ul></li> </ul>
</li>
<li><p>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection&rsquo;s items:</p> <li>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection's items:</li>
</ul>
<pre><code>$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv <pre><code>$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv
$ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map $ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map
$ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/ $ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/
</code></pre></li> </code></pre><h2 id="20160907">2016-09-07</h2>
</ul>
<h2 id="2016-09-07">2016-09-07</h2>
<ul> <ul>
<li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li> <li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li>
<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li> <li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>
<li>I suggest we disable our nightly manual vacuum task, as we&rsquo;re a mostly read workload, and I&rsquo;d rather stick as close to the documentation as possible since we haven&rsquo;t done any testing/observation of PostgreSQL</li> <li>I suggest we disable our nightly manual vacuum task, as we're a mostly read workload, and I'd rather stick as close to the documentation as possible since we haven't done any testing/observation of PostgreSQL</li>
<li>See: <a href="https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html">https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html</a></li> <li>See: <a href="https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html">https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html</a></li>
<li>CGSpace went down and the error seems to be the same as always (lately):</li>
<li><p>CGSpace went down and the error seems to be the same as always (lately):</p> </ul>
<pre><code>2016-09-07 11:39:23,162 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - <pre><code>2016-09-07 11:39:23,162 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
... ...
</code></pre></li> </code></pre><ul>
<li>Since CGSpace had crashed I quickly deployed the new LDAP settings before restarting Tomcat</li>
<li><p>Since CGSpace had crashed I quickly deployed the new LDAP settings before restarting Tomcat</p></li>
</ul> </ul>
<h2 id="20160913">2016-09-13</h2>
<h2 id="2016-09-13">2016-09-13</h2>
<ul> <ul>
<li><p>CGSpace crashed twice today, errors from <code>catalina.out</code>:</p> <li>CGSpace crashed twice today, errors from <code>catalina.out</code>:</li>
<pre><code>org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
</code></pre></li>
<li><p>I enabled logging of requests to <code>/rest</code> again</p></li>
</ul> </ul>
<h2 id="2016-09-14">2016-09-14</h2>
<ul>
<li><p>CGSpace crashed again, errors from <code>catalina.out</code>:</p>
<pre><code>org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object <pre><code>org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
</code></pre></li> </code></pre><ul>
<li>I enabled logging of requests to <code>/rest</code> again</li>
<li><p>I restarted Tomcat and it was ok again</p></li> </ul>
<h2 id="20160914">2016-09-14</h2>
<li><p>CGSpace crashed a few hours later, errors from <code>catalina.out</code>:</p> <ul>
<li>CGSpace crashed again, errors from <code>catalina.out</code>:</li>
</ul>
<pre><code>org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
</code></pre><ul>
<li>I restarted Tomcat and it was ok again</li>
<li>CGSpace crashed a few hours later, errors from <code>catalina.out</code>:</li>
</ul>
<pre><code>Exception in thread &quot;http-bio-127.0.0.1-8081-exec-25&quot; java.lang.OutOfMemoryError: Java heap space <pre><code>Exception in thread &quot;http-bio-127.0.0.1-8081-exec-25&quot; java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding.decode(StringCoding.java:215) at java.lang.StringCoding.decode(StringCoding.java:215)
</code></pre></li> </code></pre><ul>
<li>We haven't seen that in quite a while&hellip;</li>
<li><p>We haven&rsquo;t seen that in quite a while&hellip;</p></li> <li>Indeed, in a month of logs it only occurs 15 times:</li>
</ul>
<li><p>Indeed, in a month of logs it only occurs 15 times:</p>
<pre><code># grep -rsI &quot;OutOfMemoryError&quot; /var/log/tomcat7/catalina.* | wc -l <pre><code># grep -rsI &quot;OutOfMemoryError&quot; /var/log/tomcat7/catalina.* | wc -l
15 15
</code></pre></li> </code></pre><ul>
<li>I also see a bunch of errors from dspace.log:</li>
<li><p>I also see a bunch of errors from dspace.log:</p> </ul>
<pre><code>2016-09-14 12:23:07,981 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - <pre><code>2016-09-14 12:23:07,981 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre></li> </code></pre><ul>
<li>Looking at REST requests, it seems there is one IP hitting us nonstop:</li>
<li><p>Looking at REST requests, it seems there is one IP hitting us nonstop:</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3 <pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
820 50.87.54.15 820 50.87.54.15
12872 70.32.99.142 12872 70.32.99.142
25744 70.32.83.92 25744 70.32.83.92
# awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 3 # awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 3
7966 181.118.144.29 7966 181.118.144.29
54706 70.32.99.142 54706 70.32.99.142
109412 70.32.83.92 109412 70.32.83.92
</code></pre></li> </code></pre><ul>
<li>Those are the same IPs that were hitting us heavily in July, 2016 as well&hellip;</li>
<li><p>Those are the same IPs that were hitting us heavily in July, 2016 as well&hellip;</p></li> <li>I think the stability issues are definitely from REST</li>
<li>Crashed AGAIN, errors from dspace.log:</li>
<li><p>I think the stability issues are definitely from REST</p></li> </ul>
<li><p>Crashed AGAIN, errors from dspace.log:</p>
<pre><code>2016-09-14 14:31:43,069 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - <pre><code>2016-09-14 14:31:43,069 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre></li> </code></pre><ul>
<li>And more heap space errors:</li>
<li><p>And more heap space errors:</p> </ul>
<pre><code># grep -rsI &quot;OutOfMemoryError&quot; /var/log/tomcat7/catalina.* | wc -l <pre><code># grep -rsI &quot;OutOfMemoryError&quot; /var/log/tomcat7/catalina.* | wc -l
19 19
</code></pre></li> </code></pre><ul>
<li>There are no more rest requests since the last crash, so maybe there are other things causing this.</li>
<li><p>There are no more rest requests since the last crash, so maybe there are other things causing this.</p></li> <li>Hmm, I noticed a shitload of IPs from 180.76.0.0/16 are connecting to both CGSpace and DSpace Test (58 unique IPs concurrently!)</li>
<li>They seem to be coming from Baidu, and so far during today alone account for 1/6 of every connection:</li>
<li><p>Hmm, I noticed a shitload of IPs from 180.76.0.0/16 are connecting to both CGSpace and DSpace Test (58 unique IPs concurrently!)</p></li> </ul>
<li><p>They seem to be coming from Baidu, and so far during today alone account for <sup>1</sup>&frasl;<sub>6</sub> of every connection:</p>
<pre><code># grep -c ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-14 <pre><code># grep -c ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-14
29084 29084
# grep -c ip_addr=180.76.15 /home/cgspace.cgiar.org/log/dspace.log.2016-09-14 # grep -c ip_addr=180.76.15 /home/cgspace.cgiar.org/log/dspace.log.2016-09-14
5192 5192
</code></pre></li> </code></pre><ul>
<li>Other recent days are the same&hellip; hmmm.</li>
<li><p>Other recent days are the same&hellip; hmmm.</p></li> <li>From the activity control panel I can see 58 unique IPs hitting the site <em>concurrently</em>, which has GOT to hurt our stability</li>
<li>A list of all 2000 unique IPs from CGSpace logs today:</li>
<li><p>From the activity control panel I can see 58 unique IPs hitting the site <em>concurrently</em>, which has GOT to hurt our stability</p></li> </ul>
<li><p>A list of all 2000 unique IPs from CGSpace logs today:</p>
<pre><code># grep ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-11 | awk -F: '{print $5}' | sort -n | uniq -c | sort -h | tail -n 100 <pre><code># grep ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-11 | awk -F: '{print $5}' | sort -n | uniq -c | sort -h | tail -n 100
</code></pre></li> </code></pre><ul>
<li>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc&hellip; do we have any real users?</li>
<li><p>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc&hellip; do we have any real users?</p></li> <li>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li>
</ul>
<li><p>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</p>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv; <pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
</code></pre></li> </code></pre><ul>
<li>Looking into the Catalina logs again around the time of the first crash, I see:</li>
<li><p>Looking into the Catalina logs again around the time of the first crash, I see:</p> </ul>
<pre><code>Wed Sep 14 09:47:27 UTC 2016 | Query:id: 78581 AND type:2 <pre><code>Wed Sep 14 09:47:27 UTC 2016 | Query:id: 78581 AND type:2
Wed Sep 14 09:47:28 UTC 2016 | Updating : 6/6 docs. Wed Sep 14 09:47:28 UTC 2016 | Updating : 6/6 docs.
Commit Commit
Commit done Commit done
dn:CN=Haman\, Magdalena (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org dn:CN=Haman\, Magdalena (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
Exception in thread &quot;http-bio-127.0.0.1-8081-exec-193&quot; java.lang.OutOfMemoryError: Java heap space Exception in thread &quot;http-bio-127.0.0.1-8081-exec-193&quot; java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>And after that I see a bunch of &ldquo;pool error Timeout waiting for idle object&rdquo;</li>
<li><p>And after that I see a bunch of &ldquo;pool error Timeout waiting for idle object&rdquo;</p></li> <li>Later, near the time of the next crash I see:</li>
</ul>
<li><p>Later, near the time of the next crash I see:</p>
<pre><code>dn:CN=Haman\, Magdalena (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org <pre><code>dn:CN=Haman\, Magdalena (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
Wed Sep 14 11:29:55 UTC 2016 | Query:id: 79078 AND type:2 Wed Sep 14 11:29:55 UTC 2016 | Query:id: 79078 AND type:2
Wed Sep 14 11:30:20 UTC 2016 | Updating : 6/6 docs. Wed Sep 14 11:30:20 UTC 2016 | Updating : 6/6 docs.
@ -436,57 +369,45 @@ Sep 14, 2016 11:32:22 AM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXB
SEVERE: Failed to generate the schema for the JAX-B elements SEVERE: Failed to generate the schema for the JAX-B elements
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions
java.util.Map is an interface, and JAXB can't handle interfaces. java.util.Map is an interface, and JAXB can't handle interfaces.
this problem is related to the following location: this problem is related to the following location:
at java.util.Map at java.util.Map
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender() at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
at com.atmire.dspace.rest.common.Statlet at com.atmire.dspace.rest.common.Statlet
java.util.Map does not have a no-arg default constructor. java.util.Map does not have a no-arg default constructor.
this problem is related to the following location: this problem is related to the following location:
at java.util.Map at java.util.Map
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender() at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
at com.atmire.dspace.rest.common.Statlet at com.atmire.dspace.rest.common.Statlet
</code></pre></li> </code></pre><ul>
<li>Then 20 minutes later another outOfMemoryError:</li>
<li><p>Then 20 minutes later another outOfMemoryError:</p>
<pre><code>Exception in thread &quot;http-bio-127.0.0.1-8081-exec-25&quot; java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding.decode(StringCoding.java:215)
</code></pre></li>
<li><p>Perhaps these particular issues <em>are</em> memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week</p></li>
</ul> </ul>
<pre><code>Exception in thread &quot;http-bio-127.0.0.1-8081-exec-25&quot; java.lang.OutOfMemoryError: Java heap space
<p><img src="/cgspace-notes/2016/09/tomcat_jvm-day.png" alt="Tomcat JVM usage day" /> at java.lang.StringCoding.decode(StringCoding.java:215)
<img src="/cgspace-notes/2016/09/tomcat_jvm-week.png" alt="Tomcat JVM usage week" /> </code></pre><ul>
<img src="/cgspace-notes/2016/09/tomcat_jvm-month.png" alt="Tomcat JVM usage month" /></p> <li>Perhaps these particular issues <em>are</em> memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week</li>
</ul>
<p><img src="/cgspace-notes/2016/09/tomcat_jvm-day.png" alt="Tomcat JVM usage day">
<img src="/cgspace-notes/2016/09/tomcat_jvm-week.png" alt="Tomcat JVM usage week">
<img src="/cgspace-notes/2016/09/tomcat_jvm-month.png" alt="Tomcat JVM usage month"></p>
<ul> <ul>
<li>And really, we did reduce the memory of CGSpace in late 2015, so maybe we should just increase it again, now that our usage is higher and we are having memory errors in the logs</li> <li>And really, we did reduce the memory of CGSpace in late 2015, so maybe we should just increase it again, now that our usage is higher and we are having memory errors in the logs</li>
<li>Oh great, the configuration on the actual server is different than in configuration management!</li> <li>Oh great, the configuration on the actual server is different than in configuration management!</li>
<li>Seems we added a bunch of settings to the <code>/etc/default/tomcat7</code> in December, 2015 and never updated our ansible repository:</li>
<li><p>Seems we added a bunch of settings to the <code>/etc/default/tomcat7</code> in December, 2015 and never updated our ansible repository:</p> </ul>
<pre><code>JAVA_OPTS=&quot;-Djava.awt.headless=true -Xms3584m -Xmx3584m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -XX:-UseGCOverheadLimit -XX:MaxGCPauseMillis=250 -XX:GCTimeRatio=9 -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts&quot; <pre><code>JAVA_OPTS=&quot;-Djava.awt.headless=true -Xms3584m -Xmx3584m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -XX:-UseGCOverheadLimit -XX:MaxGCPauseMillis=250 -XX:GCTimeRatio=9 -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts&quot;
</code></pre></li> </code></pre><ul>
<li>So I'm going to bump the heap +512m and remove all the other experimental shit (and update ansible!)</li>
<li><p>So I&rsquo;m going to bump the heap +512m and remove all the other experimental shit (and update ansible!)</p></li> <li>Increased JVM heap to 4096m on CGSpace (linode01)</li>
<li><p>Increased JVM heap to 4096m on CGSpace (linode01)</p></li>
</ul> </ul>
<h2 id="20160915">2016-09-15</h2>
<h2 id="2016-09-15">2016-09-15</h2>
<ul> <ul>
<li>Looking at Google Webmaster Tools again, it seems the work I did on URL query parameters and blocking via the <code>X-Robots-Tag</code> HTTP header in March, 2016 seem to have had a positive effect on Google&rsquo;s index for CGSpace</li> <li>Looking at Google Webmaster Tools again, it seems the work I did on URL query parameters and blocking via the <code>X-Robots-Tag</code> HTTP header in March, 2016 seem to have had a positive effect on Google's index for CGSpace</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/09/google-webmaster-tools-index.png" alt="Google Webmaster Tools for CGSpace"></p>
<p><img src="/cgspace-notes/2016/09/google-webmaster-tools-index.png" alt="Google Webmaster Tools for CGSpace" /></p> <h2 id="20160916">2016-09-16</h2>
<h2 id="2016-09-16">2016-09-16</h2>
<ul> <ul>
<li><p>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren&rsquo;t on those lines so I&rsquo;m not sure if they were yesterday:</p> <li>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren't on those lines so I'm not sure if they were yesterday:</li>
</ul>
<pre><code>dn:CN=Orentlicher\, Natalie (CIAT),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org <pre><code>dn:CN=Orentlicher\, Natalie (CIAT),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
Thu Sep 15 18:45:25 UTC 2016 | Query:id: 55785 AND type:2 Thu Sep 15 18:45:25 UTC 2016 | Query:id: 55785 AND type:2
Thu Sep 15 18:45:26 UTC 2016 | Updating : 100/218 docs. Thu Sep 15 18:45:26 UTC 2016 | Updating : 100/218 docs.
@ -503,42 +424,33 @@ Exception in thread &quot;http-bio-127.0.0.1-8081-exec-263&quot; java.lang.OutOf
Exception in thread &quot;http-bio-127.0.0.1-8081-exec-280&quot; java.lang.OutOfMemoryError: Java heap space Exception in thread &quot;http-bio-127.0.0.1-8081-exec-280&quot; java.lang.OutOfMemoryError: Java heap space
Exception in thread &quot;Thread-54216&quot; org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 7feaa95d-8e1f-4f45-80bb Exception in thread &quot;Thread-54216&quot; org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 7feaa95d-8e1f-4f45-80bb
-e14ef82ee224 to the index; possible analysis error. -e14ef82ee224 to the index; possible analysis error.
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at com.atmire.statistics.SolrLogThread.run(SourceFile:25) at com.atmire.statistics.SolrLogThread.run(SourceFile:25)
</code></pre></li> </code></pre><ul>
<li>I bumped the heap space from 4096m to 5120m to see if this is <em>really</em> about heap speace or not.</li>
<li><p>I bumped the heap space from 4096m to 5120m to see if this is <em>really</em> about heap speace or not.</p></li> <li>Looking into some of these errors that I've seen this week but haven't noticed before:</li>
</ul>
<li><p>Looking into some of these errors that I&rsquo;ve seen this week but haven&rsquo;t noticed before:</p>
<pre><code># zcat -f -- /var/log/tomcat7/catalina.* | grep -c 'Failed to generate the schema for the JAX-B elements' <pre><code># zcat -f -- /var/log/tomcat7/catalina.* | grep -c 'Failed to generate the schema for the JAX-B elements'
113 113
</code></pre></li> </code></pre><ul>
<li>I've sent a message to Atmire about the Solr error to see if it's related to their batch update module</li>
<li><p>I&rsquo;ve sent a message to Atmire about the Solr error to see if it&rsquo;s related to their batch update module</p></li>
</ul> </ul>
<h2 id="20160919">2016-09-19</h2>
<h2 id="2016-09-19">2016-09-19</h2>
<ul> <ul>
<li><p>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</p> <li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu <pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace -d dspace -p fuuu $ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace -d dspace -p fuuu
</code></pre></li> </code></pre><ul>
<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li>
<li><p>After that we need to take the top ~300 and make a controlled vocabulary for it</p></li> <li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li>
<li><p>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</p></li>
</ul> </ul>
<h2 id="20160920">2016-09-20</h2>
<h2 id="2016-09-20">2016-09-20</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test and reboot the server</li> <li>Run all system updates on DSpace Test and reboot the server</li>
<li>Merge changes for sponsorship and affiliation controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>, <a href="https://github.com/ilri/DSpace/pull/268">#268</a>)</li> <li>Merge changes for sponsorship and affiliation controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>, <a href="https://github.com/ilri/DSpace/pull/268">#268</a>)</li>
@ -549,172 +461,122 @@ $ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2
<li>I need to read the docs and ask on the mailing list to see if we can tweak that</li> <li>I need to read the docs and ask on the mailing list to see if we can tweak that</li>
<li>Generate a new list of sponsors from the database for Peter Ballantyne so we can clean them up and update the controlled vocabulary</li> <li>Generate a new list of sponsors from the database for Peter Ballantyne so we can clean them up and update the controlled vocabulary</li>
</ul> </ul>
<h2 id="20160921">2016-09-21</h2>
<h2 id="2016-09-21">2016-09-21</h2>
<ul> <ul>
<li>Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: <a href="https://jira.duraspace.org/browse/DS-2809">https://jira.duraspace.org/browse/DS-2809</a></li> <li>Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: <a href="https://jira.duraspace.org/browse/DS-2809">https://jira.duraspace.org/browse/DS-2809</a></li>
<li>We just need to set this in <code>dspace/solr/search/conf/schema.xml</code>:</li>
<li><p>We just need to set this in <code>dspace/solr/search/conf/schema.xml</code>:</p>
<pre><code>&lt;solrQueryParser defaultOperator=&quot;AND&quot;/&gt;
</code></pre></li>
<li><p>It actually works really well, and search results return much less hits now (before, after):</p></li>
</ul> </ul>
<pre><code>&lt;solrQueryParser defaultOperator=&quot;AND&quot;/&gt;
<p><img src="/cgspace-notes/2016/09/cgspace-search.png" alt="CGSpace search with &quot;OR&quot; boolean logic" /> </code></pre><ul>
<img src="/cgspace-notes/2016/09/dspacetest-search.png" alt="DSpace Test search with &quot;AND&quot; boolean logic" /></p> <li>It actually works really well, and search results return much less hits now (before, after):</li>
</ul>
<p><img src="/cgspace-notes/2016/09/cgspace-search.png" alt="CGSpace search with &ldquo;OR&rdquo; boolean logic">
<img src="/cgspace-notes/2016/09/dspacetest-search.png" alt="DSpace Test search with &ldquo;AND&rdquo; boolean logic"></p>
<ul> <ul>
<li><p>Found a way to improve the configuration of Atmire&rsquo;s Content and Usage Analysis (CUA) module for date fields</p> <li>Found a way to improve the configuration of Atmire's Content and Usage Analysis (CUA) module for date fields</li>
</ul>
<pre><code>-content.analysis.dataset.option.8=metadata:dateAccessioned:discovery <pre><code>-content.analysis.dataset.option.8=metadata:dateAccessioned:discovery
+content.analysis.dataset.option.8=metadata:dc.date.accessioned:date(month) +content.analysis.dataset.option.8=metadata:dc.date.accessioned:date(month)
</code></pre></li> </code></pre><ul>
<li>This allows the module to treat the field as a date rather than a text string, so we can interrogate it more intelligently</li>
<li><p>This allows the module to treat the field as a date rather than a text string, so we can interrogate it more intelligently</p></li> <li>Add <code>dc.date.accessioned</code> to XMLUI Discovery search filters</li>
<li>Major CGSpace crash because ILRI forgot to pay the Linode bill</li>
<li><p>Add <code>dc.date.accessioned</code> to XMLUI Discovery search filters</p></li> <li>45 minutes of downtime!</li>
<li>Start processing the fixes to <code>dc.description.sponsorship</code> from Peter Ballantyne:</li>
<li><p>Major CGSpace crash because ILRI forgot to pay the Linode bill</p></li> </ul>
<li><p>45 minutes of downtime!</p></li>
<li><p>Start processing the fixes to <code>dc.description.sponsorship</code> from Peter Ballantyne:</p>
<pre><code>$ ./fix-metadata-values.py -i sponsors-fix-23.csv -f dc.description.sponsorship -t correct -m 29 -d dspace -u dspace -p fuuu <pre><code>$ ./fix-metadata-values.py -i sponsors-fix-23.csv -f dc.description.sponsorship -t correct -m 29 -d dspace -u dspace -p fuuu
$ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu $ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
</code></pre></li> </code></pre><ul>
<li>I need to run these and the others from a few days ago on CGSpace the next time we run updates</li>
<li><p>I need to run these and the others from a few days ago on CGSpace the next time we run updates</p></li> <li>Also, I need to update the controlled vocab for sponsors based on these</li>
<li><p>Also, I need to update the controlled vocab for sponsors based on these</p></li>
</ul> </ul>
<h2 id="20160922">2016-09-22</h2>
<h2 id="2016-09-22">2016-09-22</h2>
<ul> <ul>
<li>Update controlled vocabulary for sponsorship based on the latest corrected values from the database</li> <li>Update controlled vocabulary for sponsorship based on the latest corrected values from the database</li>
</ul> </ul>
<h2 id="20160925">2016-09-25</h2>
<h2 id="2016-09-25">2016-09-25</h2>
<ul> <ul>
<li>Merge accession date improvements for CUA module (<a href="https://github.com/ilri/DSpace/pull/275">#275</a>)</li> <li>Merge accession date improvements for CUA module (<a href="https://github.com/ilri/DSpace/pull/275">#275</a>)</li>
<li>Merge addition of accession date to Discovery search filters (<a href="https://github.com/ilri/DSpace/pull/276">#276</a>)</li> <li>Merge addition of accession date to Discovery search filters (<a href="https://github.com/ilri/DSpace/pull/276">#276</a>)</li>
<li>Merge updates to sponsorship controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/277">#277</a>)</li> <li>Merge updates to sponsorship controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/277">#277</a>)</li>
<li>I&rsquo;ve been trying to add a search filter for <code>dc.description</code> so the IITA people can search for some tags they use there, but for some reason the filter never shows up in Atmire&rsquo;s CUA</li> <li>I've been trying to add a search filter for <code>dc.description</code> so the IITA people can search for some tags they use there, but for some reason the filter never shows up in Atmire's CUA</li>
<li>Not sure if it&rsquo;s something like we already have too many filters there (30), or the filter name is reserved, etc&hellip;</li> <li>Not sure if it's something like we already have too many filters there (30), or the filter name is reserved, etc&hellip;</li>
<li>Generate a list of ILRI subjects for Peter and Abenet to look through/fix:</li>
<li><p>Generate a list of ILRI subjects for Peter and Abenet to look through/fix:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=203 group by text_value order by count desc) to /tmp/ilrisubjects.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=203 group by text_value order by count desc) to /tmp/ilrisubjects.csv with csv;
</code></pre></li> </code></pre><ul>
<li>Regenerate Discovery indexes a few times after playing with <code>discovery.xml</code> index definitions (syntax, parameters, etc).</li>
<li><p>Regenerate Discovery indexes a few times after playing with <code>discovery.xml</code> index definitions (syntax, parameters, etc).</p></li> <li>Merge changes to boolean logic in Solr search (<a href="https://github.com/ilri/DSpace/pull/274">#274</a>)</li>
<li>Run all sponsorship and affiliation fixes on CGSpace, deploy latest <code>5_x-prod</code> branch, and re-index Discovery on CGSpace</li>
<li><p>Merge changes to boolean logic in Solr search (<a href="https://github.com/ilri/DSpace/pull/274">#274</a>)</p></li> <li>Tested OCSP stapling on DSpace Test's nginx and it works:</li>
</ul>
<li><p>Run all sponsorship and affiliation fixes on CGSpace, deploy latest <code>5_x-prod</code> branch, and re-index Discovery on CGSpace</p></li>
<li><p>Tested OCSP stapling on DSpace Test&rsquo;s nginx and it works:</p>
<pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status <pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
... ...
OCSP response: OCSP response:
====================================== ======================================
OCSP Response Data: OCSP Response Data:
... ...
Cert Status: good Cert Status: good
</code></pre></li> </code></pre><ul>
<li>I've been monitoring this for almost two years in this GitHub issue: <a href="https://github.com/ilri/DSpace/issues/38">https://github.com/ilri/DSpace/issues/38</a></li>
<li><p>I&rsquo;ve been monitoring this for almost two years in this GitHub issue: <a href="https://github.com/ilri/DSpace/issues/38">https://github.com/ilri/DSpace/issues/38</a></p></li>
</ul> </ul>
<h2 id="20160927">2016-09-27</h2>
<h2 id="2016-09-27">2016-09-27</h2>
<ul> <ul>
<li>Discuss fixing some ORCIDs for CCAFS author Sonja Vermeulen with Magdalena Haman</li> <li>Discuss fixing some ORCIDs for CCAFS author Sonja Vermeulen with Magdalena Haman</li>
<li>This author has a few variations:</li>
<li><p>This author has a few variations:</p> </ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeu <pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeu
len, S%'; len, S%';
</code></pre></li> </code></pre><ul>
<li>And it looks like <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code> is the authority with the correct ORCID linked</li>
<li><p>And it looks like <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code> is the authority with the correct ORCID linked</p> </ul>
<pre><code>dspacetest=# update metadatavalue set authority='fe4b719f-6cc4-4d65-8504-7a83130b9f83w', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%'; <pre><code>dspacetest=# update metadatavalue set authority='fe4b719f-6cc4-4d65-8504-7a83130b9f83w', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
UPDATE 101 UPDATE 101
</code></pre></li> </code></pre><ul>
<li>Hmm, now her name is missing from the authors facet and only shows the authority ID</li>
<li><p>Hmm, now her name is missing from the authors facet and only shows the authority ID</p></li> <li>On the production server there is an item with her ORCID but it is using a different authority: f01f7b7b-be3f-4df7-a61d-b73c067de88d</li>
<li>Maybe I used the wrong one&hellip; I need to look again at the production database</li>
<li><p>On the production server there is an item with her ORCID but it is using a different authority: f01f7b7b-be3f-4df7-a61d-b73c067de88d</p></li> <li>On a clean snapshot of the database I see the correct authority should be <code>f01f7b7b-be3f-4df7-a61d-b73c067de88d</code>, not <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code></li>
<li>Updating her authorities again and reindexing:</li>
<li><p>Maybe I used the wrong one&hellip; I need to look again at the production database</p></li> </ul>
<li><p>On a clean snapshot of the database I see the correct authority should be <code>f01f7b7b-be3f-4df7-a61d-b73c067de88d</code>, not <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code></p></li>
<li><p>Updating her authorities again and reindexing:</p>
<pre><code>dspacetest=# update metadatavalue set authority='f01f7b7b-be3f-4df7-a61d-b73c067de88d', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%'; <pre><code>dspacetest=# update metadatavalue set authority='f01f7b7b-be3f-4df7-a61d-b73c067de88d', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
UPDATE 101 UPDATE 101
</code></pre></li> </code></pre><ul>
<li>Use GitHub icon from Font Awesome instead of a PNG to save one extra network request</li>
<li><p>Use GitHub icon from Font Awesome instead of a PNG to save one extra network request</p></li> <li>We can also replace the RSS and mail icons in community text!</li>
<li>Fix reference to <code>dc.type.*</code> in Atmire CUA module, as we now only index <code>dc.type</code> for &ldquo;Output type&rdquo;</li>
<li><p>We can also replace the RSS and mail icons in community text!</p></li>
<li><p>Fix reference to <code>dc.type.*</code> in Atmire CUA module, as we now only index <code>dc.type</code> for &ldquo;Output type&rdquo;</p></li>
</ul> </ul>
<h2 id="20160928">2016-09-28</h2>
<h2 id="2016-09-28">2016-09-28</h2>
<ul> <ul>
<li>Make a placeholder pull request for <code>discovery.xml</code> changes (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>), as I still need to test their effect on Atmire content analysis module</li> <li>Make a placeholder pull request for <code>discovery.xml</code> changes (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>), as I still need to test their effect on Atmire content analysis module</li>
<li>Make a placeholder pull request for Font Awesome changes (<a href="https://github.com/ilri/DSpace/pull/279">#279</a>), which replaces the GitHub image in the footer with an icon, and add style for RSS and @ icons that I will start replacing in community/collection HTML intros</li> <li>Make a placeholder pull request for Font Awesome changes (<a href="https://github.com/ilri/DSpace/pull/279">#279</a>), which replaces the GitHub image in the footer with an icon, and add style for RSS and @ icons that I will start replacing in community/collection HTML intros</li>
<li>Had some issues with local test server after messing with Solr too much, had to blow everything away and re-install from CGSpace</li> <li>Had some issues with local test server after messing with Solr too much, had to blow everything away and re-install from CGSpace</li>
<li>Going to try to update Sonja Vermeulen&rsquo;s authority to 2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0, as that seems to be one of her authorities that has an ORCID</li> <li>Going to try to update Sonja Vermeulen's authority to 2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0, as that seems to be one of her authorities that has an ORCID</li>
<li>Merge Font Awesome changes (<a href="https://github.com/ilri/DSpace/pull/279">#279</a>)</li> <li>Merge Font Awesome changes (<a href="https://github.com/ilri/DSpace/pull/279">#279</a>)</li>
<li>Minor fix to a string in Atmire&rsquo;s CUA module (<a href="https://github.com/ilri/DSpace/pull/280">#280</a>)</li> <li>Minor fix to a string in Atmire's CUA module (<a href="https://github.com/ilri/DSpace/pull/280">#280</a>)</li>
<li>This seems to be what I'll need to do for Sonja Vermeulen (but with <code>2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0</code> instead on the live site):</li>
<li><p>This seems to be what I&rsquo;ll need to do for Sonja Vermeulen (but with <code>2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0</code> instead on the live site):</p> </ul>
<pre><code>dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%'; <pre><code>dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen SJ%'; dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen SJ%';
</code></pre></li> </code></pre><ul>
<li>And then update Discovery and Authority indexes</li>
<li><p>And then update Discovery and Authority indexes</p></li> <li>Minor fix for &ldquo;Subject&rdquo; string in Discovery search and Atmire modules (<a href="https://github.com/ilri/DSpace/pull/281">#281</a>)</li>
<li>Start testing batch fixes for ILRI subject from Peter:</li>
<li><p>Minor fix for &ldquo;Subject&rdquo; string in Discovery search and Atmire modules (<a href="https://github.com/ilri/DSpace/pull/281">#281</a>)</p></li> </ul>
<li><p>Start testing batch fixes for ILRI subject from Peter:</p>
<pre><code>$ ./fix-metadata-values.py -i ilrisubjects-fix-32.csv -f cg.subject.ilri -t correct -m 203 -d dspace -u dspace -p fuuuu <pre><code>$ ./fix-metadata-values.py -i ilrisubjects-fix-32.csv -f cg.subject.ilri -t correct -m 203 -d dspace -u dspace -p fuuuu
$ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -m 203 -d dspace -u dspace -p fuuu $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -m 203 -d dspace -u dspace -p fuuu
</code></pre></li> </code></pre><h2 id="20160929">2016-09-29</h2>
</ul>
<h2 id="2016-09-29">2016-09-29</h2>
<ul> <ul>
<li>Add <code>cg.identifier.ciatproject</code> to metadata registry in preparation for CIAT project tag</li> <li>Add <code>cg.identifier.ciatproject</code> to metadata registry in preparation for CIAT project tag</li>
<li>Merge changes for CIAT project tag (<a href="https://github.com/ilri/DSpace/pull/282">#282</a>)</li> <li>Merge changes for CIAT project tag (<a href="https://github.com/ilri/DSpace/pull/282">#282</a>)</li>
<li>DSpace Test (linode02) became unresponsive for some reason, I had to hard reboot it from the Linode console</li> <li>DSpace Test (linode02) became unresponsive for some reason, I had to hard reboot it from the Linode console</li>
<li>People on DSpace mailing list gave me a query to get authors from certain collections:</li>
<li><p>People on DSpace mailing list gave me a query to get authors from certain collections:</p>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));
</code></pre></li>
</ul> </ul>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));
<h2 id="2016-09-30">2016-09-30</h2> </code></pre><h2 id="20160930">2016-09-30</h2>
<ul> <ul>
<li>Deny access to REST API&rsquo;s <code>find-by-metadata-field</code> endpoint to protect against an upstream security issue (DS-3250)</li> <li>Deny access to REST API's <code>find-by-metadata-field</code> endpoint to protect against an upstream security issue (DS-3250)</li>
<li>There is a patch but it is only for 5.5 and doesn&rsquo;t apply cleanly to 5.1</li> <li>There is a patch but it is only for 5.5 and doesn't apply cleanly to 5.1</li>
</ul> </ul>

View File

@ -8,19 +8,16 @@
<meta property="og:title" content="October, 2016" /> <meta property="og:title" content="October, 2016" />
<meta property="og:description" content="2016-10-03 <meta property="og:description" content="2016-10-03
Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
Need to test the following scenarios to see how author order is affected: Need to test the following scenarios to see how author order is affected:
ORCIDs only ORCIDs only
ORCIDs plus normal authors ORCIDs plus normal authors
I exported a random item&rsquo;s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry: I exported a random item&#39;s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-10/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-10/" />
@ -31,21 +28,18 @@ I exported a random item&rsquo;s metadata as CSV, deleted all columns except id
<meta name="twitter:title" content="October, 2016"/> <meta name="twitter:title" content="October, 2016"/>
<meta name="twitter:description" content="2016-10-03 <meta name="twitter:description" content="2016-10-03
Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
Need to test the following scenarios to see how author order is affected: Need to test the following scenarios to see how author order is affected:
ORCIDs only ORCIDs only
ORCIDs plus normal authors ORCIDs plus normal authors
I exported a random item&rsquo;s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry: I exported a random item&#39;s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -126,196 +120,144 @@ I exported a random item&rsquo;s metadata as CSV, deleted all columns except id
</p> </p>
</header> </header>
<h2 id="2016-10-03">2016-10-03</h2> <h2 id="20161003">2016-10-03</h2>
<ul> <ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> <li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected: <li>Need to test the following scenarios to see how author order is affected:
<ul> <ul>
<li>ORCIDs only</li> <li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li> <li>ORCIDs plus normal authors</li>
</ul></li> </ul>
</li>
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p> <li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X <pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre></li> </code></pre><ul>
</ul> <li>Hmm, with the <code>dc.contributor.author</code> column removed, DSpace doesn't detect any changes</li>
<ul>
<li>Hmm, with the <code>dc.contributor.author</code> column removed, DSpace doesn&rsquo;t detect any changes</li>
<li>With a blank <code>dc.contributor.author</code> column, DSpace wants to remove all non-ORCID authors and add the new ORCID authors</li> <li>With a blank <code>dc.contributor.author</code> column, DSpace wants to remove all non-ORCID authors and add the new ORCID authors</li>
<li>I added the <a href="https://github.com/ilri/DSpace/issues/234">disclaimer text</a> to the About page, then added a footer link to the disclaimer&rsquo;s ID, but there is a Bootstrap issue that causes the page content to disappear when using in-page anchors: <a href="https://github.com/twbs/bootstrap/issues/1768">https://github.com/twbs/bootstrap/issues/1768</a></li> <li>I added the <a href="https://github.com/ilri/DSpace/issues/234">disclaimer text</a> to the About page, then added a footer link to the disclaimer's ID, but there is a Bootstrap issue that causes the page content to disappear when using in-page anchors: <a href="https://github.com/twbs/bootstrap/issues/1768">https://github.com/twbs/bootstrap/issues/1768</a></li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/10/bootstrap-issue.png" alt="Bootstrap issue with in-page anchors"></p>
<p><img src="/cgspace-notes/2016/10/bootstrap-issue.png" alt="Bootstrap issue with in-page anchors" /></p>
<ul> <ul>
<li>Looks like we&rsquo;ll just have to add the text to the About page (without a link) or add a separate page</li> <li>Looks like we'll just have to add the text to the About page (without a link) or add a separate page</li>
</ul> </ul>
<h2 id="20161004">2016-10-04</h2>
<h2 id="2016-10-04">2016-10-04</h2>
<ul> <ul>
<li>Start testing cleanups of authors that Peter sent last week</li> <li>Start testing cleanups of authors that Peter sent last week</li>
<li>Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking: <li>Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking:
<ul> <ul>
<li>Trim leading/trailing whitespace</li> <li>Trim leading/trailing whitespace</li>
<li>Find invalid characters</li> <li>Find invalid characters</li>
<li>Cluster values to merge obvious authors</li> <li>Cluster values to merge obvious authors</li>
</ul></li> </ul>
</li>
<li><p>That left us with 3,180 valid corrections and 3 deletions:</p> <li>That left us with 3,180 valid corrections and 3 deletions:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu <pre><code>$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
$ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu
</code></pre></li> </code></pre><ul>
<li>Remove old about page (<a href="https://github.com/ilri/DSpace/pull/284">#284</a>)</li>
<li><p>Remove old about page (<a href="https://github.com/ilri/DSpace/pull/284">#284</a>)</p></li> <li>CGSpace crashed a few times today</li>
<li>Generate list of unique authors in CCAFS collections:</li>
<li><p>CGSpace crashed a few times today</p></li>
<li><p>Generate list of unique authors in CCAFS collections:</p>
<pre><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
</code></pre></li>
</ul> </ul>
<pre><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
<h2 id="2016-10-05">2016-10-05</h2> </code></pre><h2 id="20161005">2016-10-05</h2>
<ul> <ul>
<li>Work on more infrastructure cleanups for Ansible DSpace role</li> <li>Work on more infrastructure cleanups for Ansible DSpace role</li>
<li>Clean up Let&rsquo;s Encrypt plumbing and submit pull request for rmg-ansible-public (<a href="https://github.com/ilri/rmg-ansible-public/pull/60">#60</a>)</li> <li>Clean up Let's Encrypt plumbing and submit pull request for rmg-ansible-public (<a href="https://github.com/ilri/rmg-ansible-public/pull/60">#60</a>)</li>
</ul> </ul>
<h2 id="20161006">2016-10-06</h2>
<h2 id="2016-10-06">2016-10-06</h2>
<ul> <ul>
<li>Nice! DSpace Test (linode02) is now having <code>java.lang.OutOfMemoryError: Java heap space</code> errors&hellip;</li> <li>Nice! DSpace Test (linode02) is now having <code>java.lang.OutOfMemoryError: Java heap space</code> errors&hellip;</li>
<li>Heap space is 2048m, and we have 5GB of RAM being used for OS cache (Solr!) so let&rsquo;s just bump the memory to 3072m</li> <li>Heap space is 2048m, and we have 5GB of RAM being used for OS cache (Solr!) so let's just bump the memory to 3072m</li>
<li>Magdalena from CCAFS asked why the colors in the thumbnails for these <a href="https://cgspace.cgiar.org/handle/10568/71249">two</a> <a href="https://cgspace.cgiar.org/handle/10568/71259">items</a> look different, even though they are the same in the PDF itself</li> <li>Magdalena from CCAFS asked why the colors in the thumbnails for these <a href="https://cgspace.cgiar.org/handle/10568/71249">two</a> <a href="https://cgspace.cgiar.org/handle/10568/71259">items</a> look different, even though they are the same in the PDF itself</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/10/cmyk-vs-srgb.jpg" alt="CMYK vs sRGB colors"></p>
<p><img src="/cgspace-notes/2016/10/cmyk-vs-srgb.jpg" alt="CMYK vs sRGB colors" /></p>
<ul> <ul>
<li>Turns out the first PDF was exported from InDesign using CMYK and the second one was using sRGB</li> <li>Turns out the first PDF was exported from InDesign using CMYK and the second one was using sRGB</li>
<li>Run all system updates on DSpace Test and reboot it</li> <li>Run all system updates on DSpace Test and reboot it</li>
</ul> </ul>
<h2 id="20161008">2016-10-08</h2>
<h2 id="2016-10-08">2016-10-08</h2>
<ul> <ul>
<li>Re-deploy CGSpace with latest changes from late September and early October</li> <li>Re-deploy CGSpace with latest changes from late September and early October</li>
<li>Run fixes for ILRI subjects and delete blank metadata values:</li>
<li><p>Run fixes for ILRI subjects and delete blank metadata values:</p> </ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value=''; <pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 11 DELETE 11
</code></pre></li> </code></pre><ul>
<li>Run all system updates and reboot CGSpace</li>
<li><p>Run all system updates and reboot CGSpace</p></li> <li>Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):</li>
</ul>
<li><p>Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):</p>
<pre><code>root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l <pre><code>root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l
47 47
</code></pre></li> </code></pre><ul>
<li>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn't get rotated like normal log files (almost a year now maybe)</li>
<li><p>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn&rsquo;t get rotated like normal log files (almost a year now maybe)</p></li>
</ul> </ul>
<h2 id="20161014">2016-10-14</h2>
<h2 id="2016-10-14">2016-10-14</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test and reboot server</li> <li>Run all system updates on DSpace Test and reboot server</li>
<li>Looking into some issues with Discovery filters in Atmire&rsquo;s content and usage analysis module after adjusting the filter class</li> <li>Looking into some issues with Discovery filters in Atmire's content and usage analysis module after adjusting the filter class</li>
<li>Looks like changing the filters from <code>configuration.DiscoverySearchFilterFacet</code> to <code>configuration.DiscoverySearchFilter</code> breaks them in Atmire CUA module</li> <li>Looks like changing the filters from <code>configuration.DiscoverySearchFilterFacet</code> to <code>configuration.DiscoverySearchFilter</code> breaks them in Atmire CUA module</li>
</ul> </ul>
<h2 id="20161017">2016-10-17</h2>
<h2 id="2016-10-17">2016-10-17</h2>
<ul> <ul>
<li><p>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</p> <li>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</li>
<pre><code>$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
</code></pre></li>
<li><p>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</p></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
<h2 id="2016-10-18">2016-10-18</h2> </code></pre><ul>
<li>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</li>
</ul>
<h2 id="20161018">2016-10-18</h2>
<ul> <ul>
<li><p>Start working on DSpace 5.5 porting work again:</p> <li>Start working on DSpace 5.5 porting work again:</li>
</ul>
<pre><code>$ git checkout -b 5_x-55 5_x-prod <pre><code>$ git checkout -b 5_x-55 5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre><ul>
<li>Have to fix about ten merge conflicts, mostly in the SCSS for the CGIAR theme</li>
<li><p>Have to fix about ten merge conflicts, mostly in the SCSS for the CGIAR theme</p></li> <li>Skip 1e34751b8cf17021f45d4cf2b9a5800c93fb4cb2 in lieu of upstream's 55e623d1c2b8b7b1fa45db6728e172e06bfa8598 (fixes X-Forwarded-For header) because I had made the same fix myself and it's better to use the upstream one</li>
<li>I notice this rebase gets rid of GitHub merge commits&hellip; which actually might be fine because merges are fucking annoying to deal with when remote people merge without pulling and rebasing their branch first</li>
<li><p>Skip 1e34751b8cf17021f45d4cf2b9a5800c93fb4cb2 in lieu of upstream&rsquo;s 55e623d1c2b8b7b1fa45db6728e172e06bfa8598 (fixes X-Forwarded-For header) because I had made the same fix myself and it&rsquo;s better to use the upstream one</p></li> <li>Finished up applying the 5.5 sitemap changes to all themes</li>
<li>Merge the <code>discovery.xml</code> cleanups (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>)</li>
<li><p>I notice this rebase gets rid of GitHub merge commits&hellip; which actually might be fine because merges are fucking annoying to deal with when remote people merge without pulling and rebasing their branch first</p></li> <li>Merge some minor edits to the distribution license (<a href="https://github.com/ilri/DSpace/pull/285">#285</a>)</li>
<li><p>Finished up applying the 5.5 sitemap changes to all themes</p></li>
<li><p>Merge the <code>discovery.xml</code> cleanups (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>)</p></li>
<li><p>Merge some minor edits to the distribution license (<a href="https://github.com/ilri/DSpace/pull/285">#285</a>)</p></li>
</ul> </ul>
<h2 id="20161019">2016-10-19</h2>
<h2 id="2016-10-19">2016-10-19</h2>
<ul> <ul>
<li>When we move to DSpace 5.5 we should also cherry pick some patches from 5.6 branch: <li>When we move to DSpace 5.5 we should also cherry pick some patches from 5.6 branch:
<ul> <ul>
<li><a href="https://jira.duraspace.org/browse/DS-3246">memory cleanup</a>: 9f0f5940e7921765c6a22e85337331656b18a403</li> <li><a href="https://jira.duraspace.org/browse/DS-3246">memory cleanup</a>: 9f0f5940e7921765c6a22e85337331656b18a403</li>
<li>sql injection: c6fda557f731dbc200d7d58b8b61563f86fe6d06</li> <li>sql injection: c6fda557f731dbc200d7d58b8b61563f86fe6d06</li>
<li>pdfbox security issue: b5330b78153b2052ed3dc2fd65917ccdbfcc0439</li> <li>pdfbox security issue: b5330b78153b2052ed3dc2fd65917ccdbfcc0439</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2016-10-20">2016-10-20</h2> </ul>
<h2 id="20161020">2016-10-20</h2>
<ul> <ul>
<li>Run CCAFS author corrections on CGSpace</li> <li>Run CCAFS author corrections on CGSpace</li>
<li>Discovery reindexing took forever and kinda caused CGSpace to crash, so I ran all system updates and rebooted the server</li> <li>Discovery reindexing took forever and kinda caused CGSpace to crash, so I ran all system updates and rebooted the server</li>
</ul> </ul>
<h2 id="20161025">2016-10-25</h2>
<h2 id="2016-10-25">2016-10-25</h2>
<ul> <ul>
<li><p>Move the LIVES community from the top level to the ILRI projects community</p> <li>Move the LIVES community from the top level to the ILRI projects community</li>
</ul>
<pre><code>$ /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=10568/25101 <pre><code>$ /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=10568/25101
</code></pre></li> </code></pre><ul>
<li>Start testing some things for DSpace 5.5, like command line metadata import, PDF media filter, and Atmire CUA</li>
<li><p>Start testing some things for DSpace 5.5, like command line metadata import, PDF media filter, and Atmire CUA</p></li> <li>Start looking at batch fixing of &ldquo;old&rdquo; ILRI website links without www or https, for example:</li>
</ul>
<li><p>Start looking at batch fixing of &ldquo;old&rdquo; ILRI website links without www or https, for example:</p>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ilri.org%'; <pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ilri.org%';
</code></pre></li> </code></pre><ul>
<li>Also CCAFS has HTTPS and their links should use it where possible:</li>
<li><p>Also CCAFS has HTTPS and their links should use it where possible:</p> </ul>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ccafs.cgiar.org%'; <pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ccafs.cgiar.org%';
</code></pre></li> </code></pre><ul>
<li>And this will find community and collection HTML text that is using the old style PNG/JPG icons for RSS and email (we should be using Font Awesome icons instead):</li>
<li><p>And this will find community and collection HTML text that is using the old style PNG/JPG icons for RSS and email (we should be using Font Awesome icons instead):</p> </ul>
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id in (3,4) and text_value like '%Iconrss2.png%'; <pre><code>dspace=# select text_value from metadatavalue where resource_type_id in (3,4) and text_value like '%Iconrss2.png%';
</code></pre></li> </code></pre><ul>
<li>Turns out there are shit tons of varieties of this, like with http, https, www, separate <code>&lt;/img&gt;</code> tags, alignments, etc</li>
<li><p>Turns out there are shit tons of varieties of this, like with http, https, www, separate <code>&lt;/img&gt;</code> tags, alignments, etc</p></li> <li>Had to find all variations and replace them individually:</li>
</ul>
<li><p>Had to find all variations and replace them individually:</p>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/Iconrss2.png&quot;/&gt;','&lt;span class=&quot;fa fa-rss fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/Iconrss2.png&quot;/&gt;%'; <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/Iconrss2.png&quot;/&gt;','&lt;span class=&quot;fa fa-rss fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/Iconrss2.png&quot;/&gt;%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;', '&lt;span class=&quot;fa fa-at fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;', '&lt;span class=&quot;fa fa-at fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;', '&lt;span class=&quot;fa fa-rss fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;', '&lt;span class=&quot;fa fa-rss fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;%';
@ -332,20 +274,15 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;i
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;', '&lt;span class=&quot;fa fa-at fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;', '&lt;span class=&quot;fa fa-at fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;https://www.ilri.org/images/email.jpg&quot;/&gt;%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;', '&lt;span class=&quot;fa fa-rss fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;', '&lt;span class=&quot;fa fa-rss fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/Iconrss2.png&quot;/&gt;%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/email.jpg&quot;/&gt;', '&lt;span class=&quot;fa fa-at fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/email.jpg&quot;/&gt;%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/email.jpg&quot;/&gt;', '&lt;span class=&quot;fa fa-at fa-2x&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt;') where resource_type_id in (3,4) and text_value like '%&lt;img valign=&quot;center&quot; align=&quot;left&quot; src=&quot;http://www.ilri.org/images/email.jpg&quot;/&gt;%';
</code></pre></li> </code></pre><ul>
<li>Getting rid of these reduces the number of network requests each client makes on community/collection pages, and makes use of Font Awesome icons (which they are already loading anyways!)</li>
<li><p>Getting rid of these reduces the number of network requests each client makes on community/collection pages, and makes use of Font Awesome icons (which they are already loading anyways!)</p></li> <li>And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc</li>
<li>I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them</li>
<li><p>And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc</p></li>
<li><p>I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them</p></li>
</ul> </ul>
<h2 id="20161027">2016-10-27</h2>
<h2 id="2016-10-27">2016-10-27</h2>
<ul> <ul>
<li><p>Run Font Awesome fixes on DSpace Test:</p> <li>Run Font Awesome fixes on DSpace Test:</li>
</ul>
<pre><code>dspace=# \i /tmp/font-awesome-text-replace.sql <pre><code>dspace=# \i /tmp/font-awesome-text-replace.sql
UPDATE 17 UPDATE 17
UPDATE 17 UPDATE 17
@ -364,62 +301,48 @@ UPDATE 1
UPDATE 1 UPDATE 1
UPDATE 1 UPDATE 1
UPDATE 0 UPDATE 0
</code></pre></li> </code></pre><ul>
<li>Looks much better now:</li>
<li><p>Looks much better now:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/10/cgspace-icons.png" alt="CGSpace with old icons">
<p><img src="/cgspace-notes/2016/10/cgspace-icons.png" alt="CGSpace with old icons" /> <img src="/cgspace-notes/2016/10/dspacetest-fontawesome-icons.png" alt="DSpace Test with Font Awesome icons"></p>
<img src="/cgspace-notes/2016/10/dspacetest-fontawesome-icons.png" alt="DSpace Test with Font Awesome icons" /></p>
<ul> <ul>
<li>Run the same replacements on CGSpace</li> <li>Run the same replacements on CGSpace</li>
</ul> </ul>
<h2 id="20161030">2016-10-30</h2>
<h2 id="2016-10-30">2016-10-30</h2>
<ul> <ul>
<li><p>Fix some messed up authors on CGSpace:</p> <li>Fix some messed up authors on CGSpace:</li>
</ul>
<pre><code>dspace=# update metadatavalue set authority='799da1d8-22f3-43f5-8233-3d2ef5ebf8a8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Charleston, B.%'; <pre><code>dspace=# update metadatavalue set authority='799da1d8-22f3-43f5-8233-3d2ef5ebf8a8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Charleston, B.%';
UPDATE 10 UPDATE 10
dspace=# update metadatavalue set authority='e936f5c5-343d-4c46-aa91-7a1fff6277ed', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Knight-Jones%'; dspace=# update metadatavalue set authority='e936f5c5-343d-4c46-aa91-7a1fff6277ed', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Knight-Jones%';
UPDATE 36 UPDATE 36
</code></pre></li> </code></pre><ul>
<li>I updated the authority index but nothing seemed to change, so I'll wait and do it again after I update Discovery below</li>
<li><p>I updated the authority index but nothing seemed to change, so I&rsquo;ll wait and do it again after I update Discovery below</p></li> <li>Skype chat with Tsega about the <a href="https://github.com/ilri/ckm-cgspace-contentdm-bridge">IFPRI contentdm bridge</a></li>
<li>We tested harvesting OAI in an example collection to see how it works</li>
<li><p>Skype chat with Tsega about the <a href="https://github.com/ilri/ckm-cgspace-contentdm-bridge">IFPRI contentdm bridge</a></p></li> <li>Talk to Carlos Quiros about CG Core metadata in CGSpace</li>
<li>Get a list of countries from CGSpace so I can do some batch corrections:</li>
<li><p>We tested harvesting OAI in an example collection to see how it works</p></li> </ul>
<li><p>Talk to Carlos Quiros about CG Core metadata in CGSpace</p></li>
<li><p>Get a list of countries from CGSpace so I can do some batch corrections:</p>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=228 group by text_value order by count desc) to /tmp/countries.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=228 group by text_value order by count desc) to /tmp/countries.csv with csv;
</code></pre></li> </code></pre><ul>
<li>Fix a bunch of countries in Open Refine and run the corrections on CGSpace:</li>
<li><p>Fix a bunch of countries in Open Refine and run the corrections on CGSpace:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i countries-fix-18.csv -f dc.coverage.country -t 'correct' -m 228 -d dspace -u dspace -p fuuu <pre><code>$ ./fix-metadata-values.py -i countries-fix-18.csv -f dc.coverage.country -t 'correct' -m 228 -d dspace -u dspace -p fuuu
$ ./delete-metadata-values.py -i countries-delete-2.csv -f dc.coverage.country -m 228 -d dspace -u dspace -p fuuu $ ./delete-metadata-values.py -i countries-delete-2.csv -f dc.coverage.country -m 228 -d dspace -u dspace -p fuuu
</code></pre></li> </code></pre><ul>
<li>Run a shit ton of author fixes from Peter Ballantyne that we've been cleaning up for two months:</li>
<li><p>Run a shit ton of author fixes from Peter Ballantyne that we&rsquo;ve been cleaning up for two months:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-pb2.csv -f dc.contributor.author -t correct -m 3 -u dspace -d dspace -p fuuu <pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-pb2.csv -f dc.contributor.author -t correct -m 3 -u dspace -d dspace -p fuuu
</code></pre></li> </code></pre><ul>
<li>Run a few URL corrections for ilri.org and doi.org, etc:</li>
<li><p>Run a few URL corrections for ilri.org and doi.org, etc:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://www.ilri.org','https://www.ilri.org') where resource_type_id=2 and text_value like '%http://www.ilri.org%'; <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://www.ilri.org','https://www.ilri.org') where resource_type_id=2 and text_value like '%http://www.ilri.org%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://mahider.ilri.org', 'https://cgspace.cgiar.org') where resource_type_id=2 and text_value like '%http://mahider.%.org%' and metadata_field_id not in (28); dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://mahider.ilri.org', 'https://cgspace.cgiar.org') where resource_type_id=2 and text_value like '%http://mahider.%.org%' and metadata_field_id not in (28);
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://dx.doi.org%' and metadata_field_id not in (18,26,28,111); dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://dx.doi.org%' and metadata_field_id not in (18,26,28,111);
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://doi.org%' and metadata_field_id not in (18,26,28,111); dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://doi.org%' and metadata_field_id not in (18,26,28,111);
</code></pre></li> </code></pre><ul>
<li>I skipped metadata fields like citation and description</li>
<li><p>I skipped metadata fields like citation and description</p></li>
</ul> </ul>

File diff suppressed because one or more lines are too long

View File

@ -8,9 +8,7 @@
<meta property="og:title" content="December, 2016" /> <meta property="og:title" content="December, 2016" />
<meta property="og:description" content="2016-12-02 <meta property="og:description" content="2016-12-02
CGSpace was down for five hours in the morning while I was sleeping CGSpace was down for five hours in the morning while I was sleeping
While looking in the logs for errors, I see tons of warnings about Atmire MQM: While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
@ -19,11 +17,8 @@ While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
I see thousands of them in the logs for the last few months, so it&#39;s not related to the DSpace 5.5 upgrade
I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade I&#39;ve raised a ticket with Atmire to ask
I&rsquo;ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is: Another worrying error from dspace.log is:
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -35,9 +30,7 @@ Another worrying error from dspace.log is:
<meta name="twitter:title" content="December, 2016"/> <meta name="twitter:title" content="December, 2016"/>
<meta name="twitter:description" content="2016-12-02 <meta name="twitter:description" content="2016-12-02
CGSpace was down for five hours in the morning while I was sleeping CGSpace was down for five hours in the morning while I was sleeping
While looking in the logs for errors, I see tons of warnings about Atmire MQM: While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
@ -46,14 +39,11 @@ While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
I see thousands of them in the logs for the last few months, so it&#39;s not related to the DSpace 5.5 upgrade
I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade I&#39;ve raised a ticket with Atmire to ask
I&rsquo;ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is: Another worrying error from dspace.log is:
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -134,27 +124,21 @@ Another worrying error from dspace.log is:
</p> </p>
</header> </header>
<h2 id="2016-12-02">2016-12-02</h2> <h2 id="20161202">2016-12-02</h2>
<ul> <ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li> <li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p> </ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) <pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre></li> </code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li> <li>I've raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul> </ul>
<pre><code>org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery; <pre><code>org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
@ -241,39 +225,28 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180) at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
... 35 more ... 35 more
</code></pre> </code></pre><ul>
<li>The first error I see in dspace.log this morning is:</li>
<ul> </ul>
<li><p>The first error I see in dspace.log this morning is:</p>
<pre><code>2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;&quot;b0b541c1-ec15-48bf-9209-6dbe8e338cdc&quot; <pre><code>2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;&quot;b0b541c1-ec15-48bf-9209-6dbe8e338cdc&quot;
org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
</code></pre></li> </code></pre><ul>
<li>Looking through DSpace's solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</li>
<li><p>Looking through DSpace&rsquo;s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</p></li> <li>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</li>
</ul>
<li><p>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</p>
<pre><code>2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&amp;shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&amp;fq=-isInternal:true&amp;fq=-(author_mtdt:&quot;CGIAR\+Institutional\+Learning\+and\+Change\+Initiative&quot;++AND+subject_mtdt:&quot;PARTNERSHIPS&quot;+AND+subject_mtdt:&quot;RESEARCH&quot;+AND+subject_mtdt:&quot;AGRICULTURE&quot;+AND+subject_mtdt:&quot;DEVELOPMENT&quot;++AND+iso_mtdt:&quot;en&quot;+)&amp;rows=0&amp;wt=javabin&amp;version=2} hits=0 status=0 QTime=19 <pre><code>2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&amp;shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&amp;fq=-isInternal:true&amp;fq=-(author_mtdt:&quot;CGIAR\+Institutional\+Learning\+and\+Change\+Initiative&quot;++AND+subject_mtdt:&quot;PARTNERSHIPS&quot;+AND+subject_mtdt:&quot;RESEARCH&quot;+AND+subject_mtdt:&quot;AGRICULTURE&quot;+AND+subject_mtdt:&quot;DEVELOPMENT&quot;++AND+iso_mtdt:&quot;en&quot;+)&amp;rows=0&amp;wt=javabin&amp;version=2} hits=0 status=0 QTime=19
2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init() 2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
</code></pre></li> </code></pre><ul>
<li>DSpace's own Solr logs don't give IP addresses, so I will have to enable Nginx's logging of <code>/solr</code> so I can see where this request came from</li>
<li><p>DSpace&rsquo;s own Solr logs don&rsquo;t give IP addresses, so I will have to enable Nginx&rsquo;s logging of <code>/solr</code> so I can see where this request came from</p></li> <li>I enabled logging of <code>/rest/</code> and I think I'll leave it on for good</li>
<li>Also, the disk is nearly full because of log file issues, so I'm running some compression on DSpace logs</li>
<li><p>I enabled logging of <code>/rest/</code> and I think I&rsquo;ll leave it on for good</p></li> <li>Normally these stay uncompressed for a month just in case we need to look at them, so now I've just compressed anything older than 2 weeks so we can get some disk space back</li>
<li><p>Also, the disk is nearly full because of log file issues, so I&rsquo;m running some compression on DSpace logs</p></li>
<li><p>Normally these stay uncompressed for a month just in case we need to look at them, so now I&rsquo;ve just compressed anything older than 2 weeks so we can get some disk space back</p></li>
</ul> </ul>
<h2 id="20161204">2016-12-04</h2>
<h2 id="2016-12-04">2016-12-04</h2>
<ul> <ul>
<li>I got a weird report from the CGSpace checksum checker this morning</li> <li>I got a weird report from the CGSpace checksum checker this morning</li>
<li>It says 732 bitstreams have potential issues, for example:</li>
<li><p>It says 732 bitstreams have potential issues, for example:</p> </ul>
<pre><code>------------------------------------------------ <pre><code>------------------------------------------------
Bitstream Id = 6 Bitstream Id = 6
Process Start Date = Dec 4, 2016 Process Start Date = Dec 4, 2016
@ -291,16 +264,12 @@ Checksum Expected = 9959301aa4ca808d00957dff88214e38
Checksum Calculated = Checksum Calculated =
Result = The bitstream could not be found Result = The bitstream could not be found
----------------------------------------------- -----------------------------------------------
</code></pre></li> </code></pre><ul>
<li>The first one seems ok, but I don't know what to make of the second one&hellip;</li>
<li><p>The first one seems ok, but I don&rsquo;t know what to make of the second one&hellip;</p></li> <li>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</li>
<li>For what it's worth, there is no item on DSpace Test or S3 backups with that checksum either&hellip;</li>
<li><p>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</p></li> <li>In other news, I'm looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</li>
</ul>
<li><p>For what it&rsquo;s worth, there is no item on DSpace Test or S3 backups with that checksum either&hellip;</p></li>
<li><p>In other news, I&rsquo;m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</p>
<pre><code># These GC settings have shown to work well for a number of common Solr workloads <pre><code># These GC settings have shown to work well for a number of common Solr workloads
GC_TUNE=&quot;-XX:-UseSuperWord \ GC_TUNE=&quot;-XX:-UseSuperWord \
-XX:NewRatio=3 \ -XX:NewRatio=3 \
@ -320,36 +289,28 @@ GC_TUNE=&quot;-XX:-UseSuperWord \
-XX:+CMSParallelRemarkEnabled \ -XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled \ -XX:+ParallelRefProcEnabled \
-XX:+AggressiveOpts&quot; -XX:+AggressiveOpts&quot;
</code></pre></li> </code></pre><ul>
<li>I need to try these because they are recommended by the Solr project itself</li>
<li><p>I need to try these because they are recommended by the Solr project itself</p></li> <li>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey's wiki page on Solr</a></li>
<li><p>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey&rsquo;s wiki page on Solr</a></p></li>
</ul> </ul>
<h2 id="20161205">2016-12-05</h2>
<h2 id="2016-12-05">2016-12-05</h2>
<ul> <ul>
<li>I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn&rsquo;t anything amazingly obvious</li> <li>I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn't anything amazingly obvious</li>
<li>I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)</li> <li>I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)</li>
<li>Spin up new CGSpace server on Linode</li> <li>Spin up new CGSpace server on Linode</li>
<li>I did a few traceroutes from Jordan and Kenya and it seems that Linode&rsquo;s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt</li> <li>I did a few traceroutes from Jordan and Kenya and it seems that Linode's Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt</li>
<li>Do initial provisioning</li> <li>Do initial provisioning</li>
<li>Atmire responded about the MQM warnings in the DSpace logs</li> <li>Atmire responded about the MQM warnings in the DSpace logs</li>
<li>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</li>
<li><p>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</p>
<pre><code>event.consumer.batchedit.filters = Community|Collection+Create
</code></pre></li>
<li><p>I haven&rsquo;t tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></p></li>
</ul> </ul>
<pre><code>event.consumer.batchedit.filters = Community|Collection+Create
<h2 id="2016-12-06">2016-12-06</h2> </code></pre><ul>
<li>I haven't tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></li>
</ul>
<h2 id="20161206">2016-12-06</h2>
<ul> <ul>
<li><p>Some author authority corrections and name standardizations for Peter:</p> <li>Some author authority corrections and name standardizations for Peter:</li>
</ul>
<pre><code>dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%'; <pre><code>dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
UPDATE 11 UPDATE 11
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%'; dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
@ -362,335 +323,269 @@ dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183ac
UPDATE 360 UPDATE 360
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 561 UPDATE 561
</code></pre></li> </code></pre><ul>
<li>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</li>
<li><p>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</p></li> <li>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</li>
<li>More work on the KM4Dev Journal article</li>
<li><p>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</p></li> <li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
<li><p>More work on the KM4Dev Journal article</p></li> <li>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</li>
<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
<li><p>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</p></li> <li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn't dedicated (also runs Solr, which can benefit from OS cache) so let's try 1024MB</li>
<li><p>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</p></li> <li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
</ul>
<li><p>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</p></li>
<li><p>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</p></li>
<li><p>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</p></li>
<li><p>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn&rsquo;t dedicated (also runs Solr, which can benefit from OS cache) so let&rsquo;s try 1024MB</p></li>
<li><p>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</p>
<pre><code>$ time JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace index-authority <pre><code>$ time JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace index-authority
Retrieving all data Retrieving all data
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
Exception: null Exception: null
java.lang.NullPointerException java.lang.NullPointerException
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
real 8m39.913s real 8m39.913s
user 1m54.190s user 1m54.190s
sys 0m22.647s sys 0m22.647s
</code></pre></li> </code></pre><h2 id="20161207">2016-12-07</h2>
</ul>
<h2 id="2016-12-07">2016-12-07</h2>
<ul> <ul>
<li>For what it&rsquo;s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li> <li>For what it's worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
<li>I will have to test more</li> <li>I will have to test more</li>
<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don&rsquo;t want, ie &ldquo;Grace, D.&rdquo;</li> <li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don't want, ie &ldquo;Grace, D.&rdquo;</li>
<li>For example, do a Solr query for &ldquo;first_name:Grace&rdquo; and look at the results</li> <li>For example, do a Solr query for &ldquo;first_name:Grace&rdquo; and look at the results</li>
<li>Querying that ID shows the fields that need to be changed:</li>
<li><p>Querying that ID shows the fields that need to be changed:</p> </ul>
<pre><code>{ <pre><code>{
&quot;responseHeader&quot;: { &quot;responseHeader&quot;: {
&quot;status&quot;: 0, &quot;status&quot;: 0,
&quot;QTime&quot;: 1, &quot;QTime&quot;: 1,
&quot;params&quot;: { &quot;params&quot;: {
&quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, &quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;,
&quot;indent&quot;: &quot;true&quot;, &quot;indent&quot;: &quot;true&quot;,
&quot;wt&quot;: &quot;json&quot;, &quot;wt&quot;: &quot;json&quot;,
&quot;_&quot;: &quot;1481102189244&quot; &quot;_&quot;: &quot;1481102189244&quot;
} }
}, },
&quot;response&quot;: { &quot;response&quot;: {
&quot;numFound&quot;: 1, &quot;numFound&quot;: 1,
&quot;start&quot;: 0, &quot;start&quot;: 0,
&quot;docs&quot;: [ &quot;docs&quot;: [
{ {
&quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, &quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;,
&quot;field&quot;: &quot;dc_contributor_author&quot;, &quot;field&quot;: &quot;dc_contributor_author&quot;,
&quot;value&quot;: &quot;Grace, D.&quot;, &quot;value&quot;: &quot;Grace, D.&quot;,
&quot;deleted&quot;: false, &quot;deleted&quot;: false,
&quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, &quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;,
&quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, &quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;,
&quot;authority_type&quot;: &quot;person&quot;, &quot;authority_type&quot;: &quot;person&quot;,
&quot;first_name&quot;: &quot;D.&quot;, &quot;first_name&quot;: &quot;D.&quot;,
&quot;last_name&quot;: &quot;Grace&quot; &quot;last_name&quot;: &quot;Grace&quot;
}
]
} }
]
} }
} </code></pre><ul>
</code></pre></li> <li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</li>
<li>The update syntax should be something like this, but I'm getting errors from Solr:</li>
<li><p>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</p></li> </ul>
<li><p>The update syntax should be something like this, but I&rsquo;m getting errors from Solr:</p>
<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&amp;wt=json&amp;indent=true' -H 'Content-type:application/json' -d '[{&quot;id&quot;:&quot;1&quot;,&quot;price&quot;:{&quot;set&quot;:100}}]' <pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&amp;wt=json&amp;indent=true' -H 'Content-type:application/json' -d '[{&quot;id&quot;:&quot;1&quot;,&quot;price&quot;:{&quot;set&quot;:100}}]'
{ {
&quot;responseHeader&quot;:{ &quot;responseHeader&quot;:{
&quot;status&quot;:400, &quot;status&quot;:400,
&quot;QTime&quot;:0}, &quot;QTime&quot;:0},
&quot;error&quot;:{ &quot;error&quot;:{
&quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;, &quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;,
&quot;code&quot;:400}} &quot;code&quot;:400}}
</code></pre></li> </code></pre><ul>
<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
<li><p>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</p></li> <li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
</ul>
<li><p>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</p>
<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; <pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 561 UPDATE 561
</code></pre></li> </code></pre><ul>
<li>Then I'll reindex discovery and authority and see how the authority Solr core looks</li>
<li><p>Then I&rsquo;ll reindex discovery and authority and see how the authority Solr core looks</p></li> <li>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
</ul>
<li><p>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</p>
<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&amp;wt=json&amp;indent=true' <pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&amp;wt=json&amp;indent=true'
{ {
&quot;responseHeader&quot;:{ &quot;responseHeader&quot;:{
&quot;status&quot;:0, &quot;status&quot;:0,
&quot;QTime&quot;:0, &quot;QTime&quot;:0,
&quot;params&quot;:{ &quot;params&quot;:{
&quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;, &quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;,
&quot;indent&quot;:&quot;true&quot;, &quot;indent&quot;:&quot;true&quot;,
&quot;wt&quot;:&quot;json&quot;}}, &quot;wt&quot;:&quot;json&quot;}},
&quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[ &quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[
{ {
&quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;, &quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;,
&quot;field&quot;:&quot;dc_contributor_author&quot;, &quot;field&quot;:&quot;dc_contributor_author&quot;,
&quot;value&quot;:&quot;Grace, Delia&quot;, &quot;value&quot;:&quot;Grace, Delia&quot;,
&quot;deleted&quot;:false, &quot;deleted&quot;:false,
&quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, &quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;,
&quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, &quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;,
&quot;authority_type&quot;:&quot;person&quot;, &quot;authority_type&quot;:&quot;person&quot;,
&quot;first_name&quot;:&quot;Delia&quot;, &quot;first_name&quot;:&quot;Delia&quot;,
&quot;last_name&quot;:&quot;Grace&quot;}] &quot;last_name&quot;:&quot;Grace&quot;}]
}} }}
</code></pre></li> </code></pre><ul>
<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
<li><p>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</p></li> <li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
<li>Better to use:</li>
<li><p>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</p></li> </ul>
<li><p>Better to use:</p>
<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; <pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
</code></pre></li> </code></pre><ul>
<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
<li><p>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</p></li> <li>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</li>
<li>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</li>
<li><p>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</p></li> <li>Deploy &ldquo;take task&rdquo; hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</li>
<li>I ran the following author corrections and then reindexed discovery:</li>
<li><p>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</p></li> </ul>
<li><p>Deploy &ldquo;take task&rdquo; hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</p></li>
<li><p>I ran the following author corrections and then reindexed discovery:</p>
<pre><code>update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%'; <pre><code>update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%'; update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$'; update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%'; update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%'; update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
</code></pre></li> </code></pre><h2 id="20161208">2016-12-08</h2>
</ul>
<h2 id="2016-12-08">2016-12-08</h2>
<ul> <ul>
<li><p>Something weird happened and Peter Thorne&rsquo;s names all ended up as &ldquo;Thorne&rdquo;, I guess because the original authority had that as its name value:</p> <li>Something weird happened and Peter Thorne's names all ended up as &ldquo;Thorne&rdquo;, I guess because the original authority had that as its name value:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%'; <pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
text_value | authority | confidence text_value | authority | confidence
------------------+--------------------------------------+------------ ------------------+--------------------------------------+------------
Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1 Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1
Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1 Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1
Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
(6 rows) (6 rows)
</code></pre></li> </code></pre><ul>
<li>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</li>
<li><p>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</p> </ul>
<pre><code>dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812'; <pre><code>dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
UPDATE 43 UPDATE 43
</code></pre></li> </code></pre><ul>
<li>Apparently we also need to normalize Phil Thornton's names to <code>Thornton, Philip K.</code>:</li>
<li><p>Apparently we also need to normalize Phil Thornton&rsquo;s names to <code>Thornton, Philip K.</code>:</p> </ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*'; <pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
text_value | authority | confidence text_value | authority | confidence
---------------------+--------------------------------------+------------ ---------------------+--------------------------------------+------------
Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1 Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1
Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
(10 rows) (10 rows)
</code></pre></li> </code></pre><ul>
<li>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</li>
<li><p>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</p> </ul>
<pre><code>dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*'; <pre><code>dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
UPDATE 362 UPDATE 362
</code></pre></li> </code></pre><ul>
<li>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</li>
<li><p>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</p></li> <li>Everything looks ok after authority and discovery reindex</li>
<li>In other news, I think we should really be using more RAM for PostgreSQL's <code>shared_buffers</code></li>
<li><p>Everything looks ok after authority and discovery reindex</p></li> <li>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system's RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</li>
<li><p>In other news, I think we should really be using more RAM for PostgreSQL&rsquo;s <code>shared_buffers</code></p></li>
<li><p>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system&rsquo;s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</p></li>
</ul> </ul>
<h2 id="20161209">2016-12-09</h2>
<h2 id="2016-12-09">2016-12-09</h2>
<ul> <ul>
<li>More work on finishing rough draft of KM4Dev article</li> <li>More work on finishing rough draft of KM4Dev article</li>
<li>Set PostgreSQL&rsquo;s <code>shared_buffers</code> on CGSpace to 10% of system RAM (1200MB)</li> <li>Set PostgreSQL's <code>shared_buffers</code> on CGSpace to 10% of system RAM (1200MB)</li>
<li>Run the following author corrections on CGSpace:</li>
<li><p>Run the following author corrections on CGSpace:</p> </ul>
<pre><code>dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993'; <pre><code>dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*'; dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
</code></pre></li> </code></pre><ul>
<li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li>
<li><p>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</p></li>
</ul> </ul>
<h2 id="20161211">2016-12-11</h2>
<h2 id="2016-12-11">2016-12-11</h2>
<ul> <ul>
<li>After enabling a sizable <code>shared_buffers</code> for CGSpace&rsquo;s PostgreSQL configuration the number of connections to the database dropped significantly</li> <li>After enabling a sizable <code>shared_buffers</code> for CGSpace's PostgreSQL configuration the number of connections to the database dropped significantly</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/12/postgres_bgwriter-week.png" alt="postgres_bgwriter-week">
<p><img src="/cgspace-notes/2016/12/postgres_bgwriter-week.png" alt="postgres_bgwriter-week" /> <img src="/cgspace-notes/2016/12/postgres_connections_ALL-week.png" alt="postgres_connections_ALL-week"></p>
<img src="/cgspace-notes/2016/12/postgres_connections_ALL-week.png" alt="postgres_connections_ALL-week" /></p>
<ul> <ul>
<li><p>Looking at CIAT records from last week again, they have a lot of double authors like:</p> <li>Looking at CIAT records from last week again, they have a lot of double authors like:</li>
</ul>
<pre><code>International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600 <pre><code>International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500 International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0 International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
</code></pre></li> </code></pre><ul>
<li>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</li>
<li><p>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</p></li> <li>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &ldquo;no changes detected&rdquo;</li>
<li>Seems like the only way to sortof clean these up would be to start in SQL:</li>
<li><p>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &ldquo;no changes detected&rdquo;</p></li> </ul>
<li><p>Seems like the only way to sortof clean these up would be to start in SQL:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture'; <pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
text_value | authority | confidence text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------ -----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600 International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1 International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture'; dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
UPDATE 1693 UPDATE 1693
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%'; dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
UPDATE 35 UPDATE 35
</code></pre></li> </code></pre><ul>
<li>Work on article for KM4Dev journal</li>
<li><p>Work on article for KM4Dev journal</p></li>
</ul> </ul>
<h2 id="20161213">2016-12-13</h2>
<h2 id="2016-12-13">2016-12-13</h2>
<ul> <ul>
<li>Checking in on CGSpace postgres stats again, looks like the <code>shared_buffers</code> change from a few days ago really made a big impact:</li> <li>Checking in on CGSpace postgres stats again, looks like the <code>shared_buffers</code> change from a few days ago really made a big impact:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/12/postgres_bgwriter-week-2016-12-13.png" alt="postgres_bgwriter-week">
<p><img src="/cgspace-notes/2016/12/postgres_bgwriter-week-2016-12-13.png" alt="postgres_bgwriter-week" /> <img src="/cgspace-notes/2016/12/postgres_connections_ALL-week-2016-12-13.png" alt="postgres_connections_ALL-week"></p>
<img src="/cgspace-notes/2016/12/postgres_connections_ALL-week-2016-12-13.png" alt="postgres_connections_ALL-week" /></p>
<ul> <ul>
<li>Looking at logs, it seems we need to evaluate which logs we keep and for how long</li> <li>Looking at logs, it seems we need to evaluate which logs we keep and for how long</li>
<li>Basically the only ones we <em>need</em> are <code>dspace.log</code> because those are used for legacy statistics (need to keep for 1 month)</li> <li>Basically the only ones we <em>need</em> are <code>dspace.log</code> because those are used for legacy statistics (need to keep for 1 month)</li>
<li>Other logs will be an issue because they don&rsquo;t have date stamps</li> <li>Other logs will be an issue because they don't have date stamps</li>
<li>I will add date stamps to the logs we&rsquo;re storing from the tomcat7 user&rsquo;s cron jobs at least, using: <code>$(date --iso-8601)</code></li> <li>I will add date stamps to the logs we're storing from the tomcat7 user's cron jobs at least, using: <code>$(date --iso-8601)</code></li>
<li>Would probably be better to make custom logrotate files for them in the future</li> <li>Would probably be better to make custom logrotate files for them in the future</li>
<li>Clean up some unneeded log files from 2014 (they weren&rsquo;t large, just don&rsquo;t need them)</li> <li>Clean up some unneeded log files from 2014 (they weren't large, just don't need them)</li>
<li>So basically, new cron jobs for logs should look something like this:</li> <li>So basically, new cron jobs for logs should look something like this:</li>
<li>Find any file named <code>*.log*</code> that isn't <code>dspace.log*</code>, isn't already zipped, and is older than one day, and zip it:</li>
<li><p>Find any file named <code>*.log*</code> that isn&rsquo;t <code>dspace.log*</code>, isn&rsquo;t already zipped, and is older than one day, and zip it:</p> </ul>
<pre><code># find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex &quot;.*\.log.*&quot; ! -iregex &quot;.*dspace\.log.*&quot; ! -iregex &quot;.*\.(gz|lrz|lzo|xz)&quot; ! -newermt &quot;Yesterday&quot; -exec schedtool -B -e ionice -c2 -n7 xz {} \; <pre><code># find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex &quot;.*\.log.*&quot; ! -iregex &quot;.*dspace\.log.*&quot; ! -iregex &quot;.*\.(gz|lrz|lzo|xz)&quot; ! -newermt &quot;Yesterday&quot; -exec schedtool -B -e ionice -c2 -n7 xz {} \;
</code></pre></li> </code></pre><ul>
<li>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</li>
<li><p>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</p></li> <li>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</li>
<li>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</li>
<li><p>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</p></li> <li>When the tasks are running you can see that the policies do apply:</li>
</ul>
<li><p>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</p></li>
<li><p>When the tasks are running you can see that the policies do apply:</p>
<pre><code>$ schedtool $(ps aux | grep &quot;xz /home&quot; | grep -v grep | awk '{print $2}') &amp;&amp; ionice -p $(ps aux | grep &quot;xz /home&quot; | grep -v grep | awk '{print $2}') <pre><code>$ schedtool $(ps aux | grep &quot;xz /home&quot; | grep -v grep | awk '{print $2}') &amp;&amp; ionice -p $(ps aux | grep &quot;xz /home&quot; | grep -v grep | awk '{print $2}')
PID 17049: PRIO 0, POLICY B: SCHED_BATCH , NICE 0, AFFINITY 0xf PID 17049: PRIO 0, POLICY B: SCHED_BATCH , NICE 0, AFFINITY 0xf
best-effort: prio 7 best-effort: prio 7
</code></pre></li> </code></pre><ul>
<li>All in all this should free up a few gigs (we were at 9.3GB free when I started)</li>
<li><p>All in all this should free up a few gigs (we were at 9.3GB free when I started)</p></li> <li>Next thing to look at is whether we need Tomcat's access logs</li>
<li>I just looked and it seems that we saved 10GB by zipping these logs</li>
<li><p>Next thing to look at is whether we need Tomcat&rsquo;s access logs</p></li> <li>Some users pointed out issues with the &ldquo;most popular&rdquo; stats on a community or collection</li>
<li>This error appears in the logs when you try to view them:</li>
<li><p>I just looked and it seems that we saved 10GB by zipping these logs</p></li> </ul>
<li><p>Some users pointed out issues with the &ldquo;most popular&rdquo; stats on a community or collection</p></li>
<li><p>This error appears in the logs when you try to view them:</p>
<pre><code>2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! <pre><code>2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery; org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
@ -741,69 +636,54 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246) at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145) at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
</code></pre></li> </code></pre><ul>
<li>It happens on development and production, so I will have to ask Atmire</li>
<li><p>It happens on development and production, so I will have to ask Atmire</p></li> <li>Most likely an issue with installation/configuration</li>
<li><p>Most likely an issue with installation/configuration</p></li>
</ul> </ul>
<h2 id="20161214">2016-12-14</h2>
<h2 id="2016-12-14">2016-12-14</h2>
<ul> <ul>
<li>Atmire sent a quick fix for the <code>last-update.txt</code> file not found error</li> <li>Atmire sent a quick fix for the <code>last-update.txt</code> file not found error</li>
<li>After applying pull request <a href="https://github.com/ilri/DSpace/pull/291">#291</a> on DSpace Test I no longer see the error in the logs after the <code>UpdateSolrStorageReports</code> task runs</li> <li>After applying pull request <a href="https://github.com/ilri/DSpace/pull/291">#291</a> on DSpace Test I no longer see the error in the logs after the <code>UpdateSolrStorageReports</code> task runs</li>
<li>Also, I&rsquo;m toying with the idea of moving the <code>tomcat7</code> user&rsquo;s cron jobs to <code>/etc/cron.d</code> so we can manage them in Ansible</li> <li>Also, I'm toying with the idea of moving the <code>tomcat7</code> user's cron jobs to <code>/etc/cron.d</code> so we can manage them in Ansible</li>
<li>Made a pull request with a template for the cron jobs (<a href="https://github.com/ilri/rmg-ansible-public/pull/75">#75</a>)</li> <li>Made a pull request with a template for the cron jobs (<a href="https://github.com/ilri/rmg-ansible-public/pull/75">#75</a>)</li>
<li>Testing SMTP from the new CGSpace server and it&rsquo;s not working, I&rsquo;ll have to tell James</li> <li>Testing SMTP from the new CGSpace server and it's not working, I'll have to tell James</li>
</ul> </ul>
<h2 id="20161215">2016-12-15</h2>
<h2 id="2016-12-15">2016-12-15</h2>
<ul> <ul>
<li>Start planning for server migration this weekend, letting users know</li> <li>Start planning for server migration this weekend, letting users know</li>
<li>I am trying to figure out what the process is to <a href="http://handle.net/hnr_support.html">update the server&rsquo;s IP in the Handle system</a>, and emailing the hdladmin account bounces(!)</li> <li>I am trying to figure out what the process is to <a href="http://handle.net/hnr_support.html">update the server's IP in the Handle system</a>, and emailing the hdladmin account bounces(!)</li>
<li>I will contact the Jane Euler directly as I know I&rsquo;ve corresponded with her in the past</li> <li>I will contact the Jane Euler directly as I know I've corresponded with her in the past</li>
<li>She said that I should indeed just re-run the <code>[dspace]/bin/dspace make-handle-config</code> command and submit the new <code>sitebndl.zip</code> file to the CNRI website</li> <li>She said that I should indeed just re-run the <code>[dspace]/bin/dspace make-handle-config</code> command and submit the new <code>sitebndl.zip</code> file to the CNRI website</li>
<li>Also I was troubleshooting some workflow issues from Bizuwork</li> <li>Also I was troubleshooting some workflow issues from Bizuwork</li>
<li>I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it</li> <li>I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it</li>
<li>So it turns out it&rsquo;s not a bug, it&rsquo;s just that Peter was added as a reviewer/admin AFTER the items were submitted</li> <li>So it turns out it's not a bug, it's just that Peter was added as a reviewer/admin AFTER the items were submitted</li>
<li>This is how DSpace works, and I need to ask if there is a way to override someone&rsquo;s submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?</li> <li>This is how DSpace works, and I need to ask if there is a way to override someone's submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?</li>
<li>Run a batch edit to add &ldquo;RANGELANDS&rdquo; ILRI subject to all items containing the word &ldquo;RANGELANDS&rdquo; in their metadata for Peter Ballantyne</li> <li>Run a batch edit to add &ldquo;RANGELANDS&rdquo; ILRI subject to all items containing the word &ldquo;RANGELANDS&rdquo; in their metadata for Peter Ballantyne</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/12/batch-edit1.png" alt="Select all items with &ldquo;rangelands&rdquo; in metadata">
<p><img src="/cgspace-notes/2016/12/batch-edit1.png" alt="Select all items with &quot;rangelands&quot; in metadata" /> <img src="/cgspace-notes/2016/12/batch-edit2.png" alt="Add RANGELANDS ILRI subject"></p>
<img src="/cgspace-notes/2016/12/batch-edit2.png" alt="Add RANGELANDS ILRI subject" /></p> <h2 id="20161218">2016-12-18</h2>
<h2 id="2016-12-18">2016-12-18</h2>
<ul> <ul>
<li>Add four new CRP subjects for 2017 and sort the input forms alphabetically (<a href="https://github.com/ilri/DSpace/pull/294">#294</a>)</li> <li>Add four new CRP subjects for 2017 and sort the input forms alphabetically (<a href="https://github.com/ilri/DSpace/pull/294">#294</a>)</li>
<li>Test the SMTP on the new server and it&rsquo;s working</li> <li>Test the SMTP on the new server and it's working</li>
<li>Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately</li> <li>Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately</li>
<li>We quickly told them to undo it, but I just realized they didn&rsquo;t undo the IPv6 AAAA record!</li> <li>We quickly told them to undo it, but I just realized they didn't undo the IPv6 AAAA record!</li>
<li>None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then</li> <li>None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then</li>
<li>Update some names and authorities in the database:</li>
<li><p>Update some names and authorities in the database:</p> </ul>
<pre><code>dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*'; <pre><code>dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
UPDATE 204 UPDATE 204
dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%'; dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%';
UPDATE 89 UPDATE 89
dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%'; dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
UPDATE 140 UPDATE 140
</code></pre></li> </code></pre><ul>
<li>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</li>
<li><p>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</p></li> <li>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn't work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</li>
<li>I need to run these updates along with the other one for CIAT that I found last week</li>
<li><p>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn&rsquo;t work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</p></li> <li>Enable OCSP stapling for hosts &gt;= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</li>
<li>Working for DSpace Test on the second response:</li>
<li><p>I need to run these updates along with the other one for CIAT that I found last week</p></li> </ul>
<li><p>Enable OCSP stapling for hosts &gt;= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</p></li>
<li><p>Working for DSpace Test on the second response:</p>
<pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status <pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
... ...
OCSP response: no response sent OCSP response: no response sent
@ -811,19 +691,16 @@ $ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgia
... ...
OCSP Response Data: OCSP Response Data:
... ...
Cert Status: good Cert Status: good
</code></pre></li> </code></pre><ul>
<li>Migrate CGSpace to new server, roughly following these steps:</li>
<li><p>Migrate CGSpace to new server, roughly following these steps:</p></li> <li>On old server:</li>
</ul>
<li><p>On old server:</p>
<pre><code># service tomcat7 stop <pre><code># service tomcat7 stop
# /home/backup/scripts/postgres_backup.sh # /home/backup/scripts/postgres_backup.sh
</code></pre></li> </code></pre><ul>
<li>On new server:</li>
<li><p>On new server:</p> </ul>
<pre><code># systemctl stop tomcat7 <pre><code># systemctl stop tomcat7
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/ # rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/
# rsync -4 -av --delete 178.79.187.182:/home/backup/ /home/backup/ # rsync -4 -av --delete 178.79.187.182:/home/backup/ /home/backup/
@ -848,44 +725,34 @@ $ cd src/git/DSpace/dspace/target/dspace-installer
$ ant update clean_backups $ ant update clean_backups
$ exit $ exit
# systemctl start tomcat7 # systemctl start tomcat7
</code></pre></li> </code></pre><ul>
<li>It took about twenty minutes and afterwards I had to check a few things, like:
<li><p>It took about twenty minutes and afterwards I had to check a few things, like:</p>
<ul> <ul>
<li>check and enable systemd timer for let&rsquo;s encrypt</li> <li>check and enable systemd timer for let's encrypt</li>
<li>enable root cron jobs</li> <li>enable root cron jobs</li>
<li>disable root cron jobs on old server after!</li> <li>disable root cron jobs on old server after!</li>
<li>enable tomcat7 cron jobs</li> <li>enable tomcat7 cron jobs</li>
<li>disable tomcat7 cron jobs on old server after!</li> <li>disable tomcat7 cron jobs on old server after!</li>
<li>regenerate <code>sitebndl.zip</code> with new IP for handle server and submit it to Handle.net</li> <li>regenerate <code>sitebndl.zip</code> with new IP for handle server and submit it to Handle.net</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2016-12-22">2016-12-22</h2> </ul>
<h2 id="20161222">2016-12-22</h2>
<ul> <ul>
<li>Abenet wanted a CSV of the IITA community, but the web export doesn&rsquo;t include the <code>dc.date.accessioned</code> field</li> <li>Abenet wanted a CSV of the IITA community, but the web export doesn't include the <code>dc.date.accessioned</code> field</li>
<li>I had to export it from the command line using the <code>-a</code> flag:</li>
<li><p>I had to export it from the command line using the <code>-a</code> flag:</p> </ul>
<pre><code>$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616 <pre><code>$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
</code></pre></li> </code></pre><h2 id="20161228">2016-12-28</h2>
</ul>
<h2 id="2016-12-28">2016-12-28</h2>
<ul> <ul>
<li>We&rsquo;ve been getting two alerts per day about CPU usage on the new server from Linode</li> <li>We've been getting two alerts per day about CPU usage on the new server from Linode</li>
<li>These are caused by the batch jobs for Solr etc that run in the early morning hours</li> <li>These are caused by the batch jobs for Solr etc that run in the early morning hours</li>
<li>The Linode default is to alert at 90% CPU usage for two hours, but I see the old server was at 150%, so maybe we just need to adjust it</li> <li>The Linode default is to alert at 90% CPU usage for two hours, but I see the old server was at 150%, so maybe we just need to adjust it</li>
<li>Speaking of the old server (linode01), I think we can decommission it now</li> <li>Speaking of the old server (linode01), I think we can decommission it now</li>
<li>I checked the S3 logs on the new server (linode18) to make sure the backups have been running and everything looks good</li> <li>I checked the S3 logs on the new server (linode18) to make sure the backups have been running and everything looks good</li>
<li>In other news, I was looking at the Munin graphs for PostgreSQL on the new server and it looks slightly worrying:</li> <li>In other news, I was looking at the Munin graphs for PostgreSQL on the new server and it looks slightly worrying:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/12/postgres_size_ALL-week.png" alt="munin postgres stats"></p>
<p><img src="/cgspace-notes/2016/12/postgres_size_ALL-week.png" alt="munin postgres stats" /></p>
<ul> <ul>
<li>I will have to check later why the size keeps increasing</li> <li>I will have to check later why the size keeps increasing</li>
</ul> </ul>

View File

@ -8,10 +8,9 @@
<meta property="og:title" content="January, 2017" /> <meta property="og:title" content="January, 2017" />
<meta property="og:description" content="2017-01-02 <meta property="og:description" content="2017-01-02
I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
I tested on DSpace Test as well and it doesn&rsquo;t work there either I tested on DSpace Test as well and it doesn&#39;t work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&#39;m not sure if we&#39;ve ever had the sharding task run successfully over all these years
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-01/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-01/" />
@ -22,12 +21,11 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
<meta name="twitter:title" content="January, 2017"/> <meta name="twitter:title" content="January, 2017"/>
<meta name="twitter:description" content="2017-01-02 <meta name="twitter:description" content="2017-01-02
I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
I tested on DSpace Test as well and it doesn&rsquo;t work there either I tested on DSpace Test as well and it doesn&#39;t work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&#39;m not sure if we&#39;ve ever had the sharding task run successfully over all these years
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -108,77 +106,71 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
</p> </p>
</header> </header>
<h2 id="2017-01-02">2017-01-02</h2> <h2 id="20170102">2017-01-02</h2>
<ul> <ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul> </ul>
<h2 id="20170104">2017-01-04</h2>
<h2 id="2017-01-04">2017-01-04</h2>
<ul> <ul>
<li><p>I tried to shard my local dev instance and it fails the same way:</p> <li>I tried to shard my local dev instance and it fails the same way:</li>
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace stats-util -s <pre><code>$ JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace stats-util -s
Moving: 9318 into core statistics-2016 Moving: 9318 into core statistics-2016
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016 Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016 org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2291) at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2291)
at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106) at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
Caused by: org.apache.http.client.ClientProtocolException Caused by: org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
... 10 more ... 10 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed.
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
... 14 more ... 14 more
Caused by: java.net.SocketException: Broken pipe (Write failed) Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181) at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124) at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181) at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132) at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203) at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
... 16 more ... 16 more
</code></pre></li> </code></pre><ul>
<li>And the DSpace log shows:</li>
<li><p>And the DSpace log shows:</p> </ul>
<pre><code>2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016 <pre><code>2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Moving: 9318 records into core statistics-2016 2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Moving: 9318 records into core statistics-2016
2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (java.net.SocketException) caught when processing request to {}-&gt;http://localhost:8081: Broken pipe (Write failed) 2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (java.net.SocketException) caught when processing request to {}-&gt;http://localhost:8081: Broken pipe (Write failed)
2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}-&gt;http://localhost:8081 2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}-&gt;http://localhost:8081
</code></pre></li> </code></pre><ul>
<li>Despite failing instantly, a <code>statistics-2016</code> directory was created, but it only has a data dir (no conf)</li>
<li><p>Despite failing instantly, a <code>statistics-2016</code> directory was created, but it only has a data dir (no conf)</p></li> <li>The Tomcat access logs show more:</li>
</ul>
<li><p>The Tomcat access logs show more:</p>
<pre><code>127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] &quot;GET /solr/statistics/select?q=type%3A2+AND+id%3A1&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 107 <pre><code>127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] &quot;GET /solr/statistics/select?q=type%3A2+AND+id%3A1&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 107
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] &quot;GET /solr/statistics/select?q=*%3A*&amp;rows=0&amp;facet=true&amp;facet.range=time&amp;facet.range.start=NOW%2FYEAR-17YEARS&amp;facet.range.end=NOW%2FYEAR%2B0YEARS&amp;facet.range.gap=%2B1YEAR&amp;facet.mincount=1&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 423 127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] &quot;GET /solr/statistics/select?q=*%3A*&amp;rows=0&amp;facet=true&amp;facet.range=time&amp;facet.range.start=NOW%2FYEAR-17YEARS&amp;facet.range.end=NOW%2FYEAR%2B0YEARS&amp;facet.range.gap=%2B1YEAR&amp;facet.mincount=1&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 423
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] &quot;GET /solr/admin/cores?action=STATUS&amp;core=statistics-2016&amp;indexInfo=true&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 77 127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] &quot;GET /solr/admin/cores?action=STATUS&amp;core=statistics-2016&amp;indexInfo=true&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 77
@ -188,228 +180,163 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] &quot;POST /solr//statistics-2016/update/csv?commit=true&amp;softCommit=false&amp;waitSearcher=true&amp;f.previousWorkflowStep.split=true&amp;f.previousWorkflowStep.separator=%7C&amp;f.previousWorkflowStep.encapsulator=%22&amp;f.actingGroupId.split=true&amp;f.actingGroupId.separator=%7C&amp;f.actingGroupId.encapsulator=%22&amp;f.containerCommunity.split=true&amp;f.containerCommunity.separator=%7C&amp;f.containerCommunity.encapsulator=%22&amp;f.range.split=true&amp;f.range.separator=%7C&amp;f.range.encapsulator=%22&amp;f.containerItem.split=true&amp;f.containerItem.separator=%7C&amp;f.containerItem.encapsulator=%22&amp;f.p_communities_map.split=true&amp;f.p_communities_map.separator=%7C&amp;f.p_communities_map.encapsulator=%22&amp;f.ngram_query_search.split=true&amp;f.ngram_query_search.separator=%7C&amp;f.ngram_query_search.encapsulator=%22&amp;f.containerBitstream.split=true&amp;f.containerBitstream.separator=%7C&amp;f.containerBitstream.encapsulator=%22&amp;f.owningItem.split=true&amp;f.owningItem.separator=%7C&amp;f.owningItem.encapsulator=%22&amp;f.actingGroupParentId.split=true&amp;f.actingGroupParentId.separator=%7C&amp;f.actingGroupParentId.encapsulator=%22&amp;f.text.split=true&amp;f.text.separator=%7C&amp;f.text.encapsulator=%22&amp;f.simple_query_search.split=true&amp;f.simple_query_search.separator=%7C&amp;f.simple_query_search.encapsulator=%22&amp;f.owningComm.split=true&amp;f.owningComm.separator=%7C&amp;f.owningComm.encapsulator=%22&amp;f.owner.split=true&amp;f.owner.separator=%7C&amp;f.owner.encapsulator=%22&amp;f.filterquery.split=true&amp;f.filterquery.separator=%7C&amp;f.filterquery.encapsulator=%22&amp;f.p_group_map.split=true&amp;f.p_group_map.separator=%7C&amp;f.p_group_map.encapsulator=%22&amp;f.actorMemberGroupId.split=true&amp;f.actorMemberGroupId.separator=%7C&amp;f.actorMemberGroupId.encapsulator=%22&amp;f.bitstreamId.split=true&amp;f.bitstreamId.separator=%7C&amp;f.bitstreamId.encapsulator=%22&amp;f.group_name.split=true&amp;f.group_name.separator=%7C&amp;f.group_name.encapsulator=%22&amp;f.p_communities_name.split=true&amp;f.p_communities_name.separator=%7C&amp;f.p_communities_name.encapsulator=%22&amp;f.query.split=true&amp;f.query.separator=%7C&amp;f.query.encapsulator=%22&amp;f.workflowStep.split=true&amp;f.workflowStep.separator=%7C&amp;f.workflowStep.encapsulator=%22&amp;f.containerCollection.split=true&amp;f.containerCollection.separator=%7C&amp;f.containerCollection.encapsulator=%22&amp;f.complete_query_search.split=true&amp;f.complete_query_search.separator=%7C&amp;f.complete_query_search.encapsulator=%22&amp;f.p_communities_id.split=true&amp;f.p_communities_id.separator=%7C&amp;f.p_communities_id.encapsulator=%22&amp;f.rangeDescription.split=true&amp;f.rangeDescription.separator=%7C&amp;f.rangeDescription.encapsulator=%22&amp;f.group_id.split=true&amp;f.group_id.separator=%7C&amp;f.group_id.encapsulator=%22&amp;f.bundleName.split=true&amp;f.bundleName.separator=%7C&amp;f.bundleName.encapsulator=%22&amp;f.ngram_simplequery_search.split=true&amp;f.ngram_simplequery_search.separator=%7C&amp;f.ngram_simplequery_search.encapsulator=%22&amp;f.group_map.split=true&amp;f.group_map.separator=%7C&amp;f.group_map.encapsulator=%22&amp;f.owningColl.split=true&amp;f.owningColl.separator=%7C&amp;f.owningColl.encapsulator=%22&amp;f.p_group_id.split=true&amp;f.p_group_id.separator=%7C&amp;f.p_group_id.encapsulator=%22&amp;f.p_group_name.split=true&amp;f.p_group_name.separator=%7C&amp;f.p_group_name.encapsulator=%22&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 409 156 127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] &quot;POST /solr//statistics-2016/update/csv?commit=true&amp;softCommit=false&amp;waitSearcher=true&amp;f.previousWorkflowStep.split=true&amp;f.previousWorkflowStep.separator=%7C&amp;f.previousWorkflowStep.encapsulator=%22&amp;f.actingGroupId.split=true&amp;f.actingGroupId.separator=%7C&amp;f.actingGroupId.encapsulator=%22&amp;f.containerCommunity.split=true&amp;f.containerCommunity.separator=%7C&amp;f.containerCommunity.encapsulator=%22&amp;f.range.split=true&amp;f.range.separator=%7C&amp;f.range.encapsulator=%22&amp;f.containerItem.split=true&amp;f.containerItem.separator=%7C&amp;f.containerItem.encapsulator=%22&amp;f.p_communities_map.split=true&amp;f.p_communities_map.separator=%7C&amp;f.p_communities_map.encapsulator=%22&amp;f.ngram_query_search.split=true&amp;f.ngram_query_search.separator=%7C&amp;f.ngram_query_search.encapsulator=%22&amp;f.containerBitstream.split=true&amp;f.containerBitstream.separator=%7C&amp;f.containerBitstream.encapsulator=%22&amp;f.owningItem.split=true&amp;f.owningItem.separator=%7C&amp;f.owningItem.encapsulator=%22&amp;f.actingGroupParentId.split=true&amp;f.actingGroupParentId.separator=%7C&amp;f.actingGroupParentId.encapsulator=%22&amp;f.text.split=true&amp;f.text.separator=%7C&amp;f.text.encapsulator=%22&amp;f.simple_query_search.split=true&amp;f.simple_query_search.separator=%7C&amp;f.simple_query_search.encapsulator=%22&amp;f.owningComm.split=true&amp;f.owningComm.separator=%7C&amp;f.owningComm.encapsulator=%22&amp;f.owner.split=true&amp;f.owner.separator=%7C&amp;f.owner.encapsulator=%22&amp;f.filterquery.split=true&amp;f.filterquery.separator=%7C&amp;f.filterquery.encapsulator=%22&amp;f.p_group_map.split=true&amp;f.p_group_map.separator=%7C&amp;f.p_group_map.encapsulator=%22&amp;f.actorMemberGroupId.split=true&amp;f.actorMemberGroupId.separator=%7C&amp;f.actorMemberGroupId.encapsulator=%22&amp;f.bitstreamId.split=true&amp;f.bitstreamId.separator=%7C&amp;f.bitstreamId.encapsulator=%22&amp;f.group_name.split=true&amp;f.group_name.separator=%7C&amp;f.group_name.encapsulator=%22&amp;f.p_communities_name.split=true&amp;f.p_communities_name.separator=%7C&amp;f.p_communities_name.encapsulator=%22&amp;f.query.split=true&amp;f.query.separator=%7C&amp;f.query.encapsulator=%22&amp;f.workflowStep.split=true&amp;f.workflowStep.separator=%7C&amp;f.workflowStep.encapsulator=%22&amp;f.containerCollection.split=true&amp;f.containerCollection.separator=%7C&amp;f.containerCollection.encapsulator=%22&amp;f.complete_query_search.split=true&amp;f.complete_query_search.separator=%7C&amp;f.complete_query_search.encapsulator=%22&amp;f.p_communities_id.split=true&amp;f.p_communities_id.separator=%7C&amp;f.p_communities_id.encapsulator=%22&amp;f.rangeDescription.split=true&amp;f.rangeDescription.separator=%7C&amp;f.rangeDescription.encapsulator=%22&amp;f.group_id.split=true&amp;f.group_id.separator=%7C&amp;f.group_id.encapsulator=%22&amp;f.bundleName.split=true&amp;f.bundleName.separator=%7C&amp;f.bundleName.encapsulator=%22&amp;f.ngram_simplequery_search.split=true&amp;f.ngram_simplequery_search.separator=%7C&amp;f.ngram_simplequery_search.encapsulator=%22&amp;f.group_map.split=true&amp;f.group_map.separator=%7C&amp;f.group_map.encapsulator=%22&amp;f.owningColl.split=true&amp;f.owningColl.separator=%7C&amp;f.owningColl.encapsulator=%22&amp;f.p_group_id.split=true&amp;f.p_group_id.separator=%7C&amp;f.p_group_id.encapsulator=%22&amp;f.p_group_name.split=true&amp;f.p_group_name.separator=%7C&amp;f.p_group_name.encapsulator=%22&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 409 156
127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] &quot;POST /solr/datatables/update?wt=javabin&amp;version=2 HTTP/1.1&quot; 200 41 127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] &quot;POST /solr/datatables/update?wt=javabin&amp;version=2 HTTP/1.1&quot; 200 41
127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] &quot;POST /solr/datatables/update HTTP/1.1&quot; 200 40 127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] &quot;POST /solr/datatables/update HTTP/1.1&quot; 200 40
</code></pre></li> </code></pre><ul>
<li>Very interesting&hellip; it creates the core and then fails somehow</li>
<li><p>Very interesting&hellip; it creates the core and then fails somehow</p></li>
</ul> </ul>
<h2 id="20170108">2017-01-08</h2>
<h2 id="2017-01-08">2017-01-08</h2>
<ul> <ul>
<li>Put Sisay&rsquo;s <code>item-view.xsl</code> code to show mapped collections on CGSpace (<a href="https://github.com/ilri/DSpace/pull/295">#295</a>)</li> <li>Put Sisay's <code>item-view.xsl</code> code to show mapped collections on CGSpace (<a href="https://github.com/ilri/DSpace/pull/295">#295</a>)</li>
</ul> </ul>
<h2 id="20170109">2017-01-09</h2>
<h2 id="2017-01-09">2017-01-09</h2>
<ul> <ul>
<li>A user wrote to tell me that the new display of an item&rsquo;s mappings had a crazy bug for at least one item: <a href="https://cgspace.cgiar.org/handle/10568/78596">https://cgspace.cgiar.org/handle/10568/78596</a></li> <li>A user wrote to tell me that the new display of an item's mappings had a crazy bug for at least one item: <a href="https://cgspace.cgiar.org/handle/10568/78596">https://cgspace.cgiar.org/handle/10568/78596</a></li>
<li>She said she only mapped it once, but it appears to be mapped 184 times</li> <li>She said she only mapped it once, but it appears to be mapped 184 times</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/01/mapping-crazy-duplicate.png" alt="Crazy item mapping"></p>
<p><img src="/cgspace-notes/2017/01/mapping-crazy-duplicate.png" alt="Crazy item mapping" /></p> <h2 id="20170110">2017-01-10</h2>
<h2 id="2017-01-10">2017-01-10</h2>
<ul> <ul>
<li>I tried to clean up the duplicate mappings by exporting the item&rsquo;s metadata to CSV, editing, and re-importing, but DSpace said &ldquo;no changes were detected&rdquo;</li> <li>I tried to clean up the duplicate mappings by exporting the item's metadata to CSV, editing, and re-importing, but DSpace said &ldquo;no changes were detected&rdquo;</li>
<li>I&rsquo;ve asked on the dspace-tech mailing list to see if anyone can help</li> <li>I've asked on the dspace-tech mailing list to see if anyone can help</li>
<li>I found an old post on the mailing list discussing a similar issue, and listing some SQL commands that might help</li> <li>I found an old post on the mailing list discussing a similar issue, and listing some SQL commands that might help</li>
<li>For example, this shows 186 mappings for the item, the first three of which are real:</li>
<li><p>For example, this shows 186 mappings for the item, the first three of which are real:</p>
<pre><code>dspace=# select * from collection2item where item_id = '80596';
</code></pre></li>
<li><p>Then I deleted the others:</p>
<pre><code>dspace=# delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
</code></pre></li>
<li><p>And in the item view it now shows the correct mappings</p></li>
<li><p>I will have to ask the DSpace people if this is a valid approach</p></li>
<li><p>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</p></li>
</ul> </ul>
<pre><code>dspace=# select * from collection2item where item_id = '80596';
<h2 id="2017-01-11">2017-01-11</h2> </code></pre><ul>
<li>Then I deleted the others:</li>
</ul>
<pre><code>dspace=# delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
</code></pre><ul>
<li>And in the item view it now shows the correct mappings</li>
<li>I will have to ask the DSpace people if this is a valid approach</li>
<li>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</li>
</ul>
<h2 id="20170111">2017-01-11</h2>
<ul> <ul>
<li>Maria found another item with duplicate mappings: <a href="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li> <li>Maria found another item with duplicate mappings: <a href="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li>
<li>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung &amp; Ländlicher Raum:</li>
<li><p>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung &amp; Ländlicher Raum:</p>
<pre><code>Traceback (most recent call last):
File &quot;./fix-metadata-values.py&quot;, line 80, in &lt;module&gt;
print(&quot;Fixing {} occurences of: {}&quot;.format(records_to_fix, record[0]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
</code></pre></li>
<li><p>Seems we need to encode as UTF-8 before printing to screen, ie:</p>
<pre><code>print(&quot;Fixing {} occurences of: {}&quot;.format(records_to_fix, record[0].encode('utf-8')))
</code></pre></li>
<li><p>See: <a href="http://stackoverflow.com/a/36427358/487333">http://stackoverflow.com/a/36427358/487333</a></p></li>
<li><p>I&rsquo;m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database&hellip; I&rsquo;ve never had this issue before</p></li>
<li><p>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'fuuu'
</code></pre></li>
<li><p>Now get the top 500 journal titles:</p>
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
</code></pre></li>
<li><p>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</p></li>
<li><p>I will have to go through these and fix some more before making the controlled vocabulary</p></li>
<li><p>Added 30 more corrections or so, now there are 49 total and I&rsquo;ll have to get the top 500 after applying them</p></li>
</ul> </ul>
<pre><code>Traceback (most recent call last):
<h2 id="2017-01-13">2017-01-13</h2> File &quot;./fix-metadata-values.py&quot;, line 80, in &lt;module&gt;
print(&quot;Fixing {} occurences of: {}&quot;.format(records_to_fix, record[0]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
</code></pre><ul>
<li>Seems we need to encode as UTF-8 before printing to screen, ie:</li>
</ul>
<pre><code>print(&quot;Fixing {} occurences of: {}&quot;.format(records_to_fix, record[0].encode('utf-8')))
</code></pre><ul>
<li>See: <a href="http://stackoverflow.com/a/36427358/487333">http://stackoverflow.com/a/36427358/487333</a></li>
<li>I'm actually not sure if we need to encode() the strings to UTF-8 before writing them to the database&hellip; I've never had this issue before</li>
<li>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'fuuu'
</code></pre><ul>
<li>Now get the top 500 journal titles:</li>
</ul>
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
</code></pre><ul>
<li>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</li>
<li>I will have to go through these and fix some more before making the controlled vocabulary</li>
<li>Added 30 more corrections or so, now there are 49 total and I'll have to get the top 500 after applying them</li>
</ul>
<h2 id="20170113">2017-01-13</h2>
<ul> <ul>
<li>Add <code>FOOD SYSTEMS</code> to CIAT subjects, waiting to merge: <a href="https://github.com/ilri/DSpace/pull/296">https://github.com/ilri/DSpace/pull/296</a></li> <li>Add <code>FOOD SYSTEMS</code> to CIAT subjects, waiting to merge: <a href="https://github.com/ilri/DSpace/pull/296">https://github.com/ilri/DSpace/pull/296</a></li>
</ul> </ul>
<h2 id="20170116">2017-01-16</h2>
<h2 id="2017-01-16">2017-01-16</h2>
<ul> <ul>
<li><p>Fix the two items Maria found with duplicate mappings with this script:</p> <li>Fix the two items Maria found with duplicate mappings with this script:</li>
</ul>
<pre><code>/* 184 in correct mappings: https://cgspace.cgiar.org/handle/10568/78596 */ <pre><code>/* 184 in correct mappings: https://cgspace.cgiar.org/handle/10568/78596 */
delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807); delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
/* 1 incorrect mapping: https://cgspace.cgiar.org/handle/10568/78658 */ /* 1 incorrect mapping: https://cgspace.cgiar.org/handle/10568/78658 */
delete from collection2item where id = '91082'; delete from collection2item where id = '91082';
</code></pre></li> </code></pre><h2 id="20170117">2017-01-17</h2>
</ul>
<h2 id="2017-01-17">2017-01-17</h2>
<ul> <ul>
<li>Helping clean up some file names in the 232 CIAT records that Sisay worked on last week</li> <li>Helping clean up some file names in the 232 CIAT records that Sisay worked on last week</li>
<li>There are about 30 files with <code>%20</code> (space) and Spanish accents in the file name</li> <li>There are about 30 files with <code>%20</code> (space) and Spanish accents in the file name</li>
<li>At first I thought we should fix these, but actually it is <a href="https://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1">prescribed by the W3 working group to convert these to UTF8 and URL encode them</a>!</li> <li>At first I thought we should fix these, but actually it is <a href="https://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1">prescribed by the W3 working group to convert these to UTF8 and URL encode them</a>!</li>
<li>And the file names don&rsquo;t really matter either, as long as the SAF Builder tool can read them—after that DSpace renames them with a hash in the assetstore</li> <li>And the file names don't really matter either, as long as the SAF Builder tool can read them—after that DSpace renames them with a hash in the assetstore</li>
<li>Seems like the only ones I should replace are the <code>'</code> apostrophe characters, as <code>%27</code>:</li>
<li><p>Seems like the only ones I should replace are the <code>'</code> apostrophe characters, as <code>%27</code>:</p> </ul>
<pre><code>value.replace(&quot;'&quot;,'%27') <pre><code>value.replace(&quot;'&quot;,'%27')
</code></pre></li> </code></pre><ul>
<li>Add the item's Type to the filename column as a hint to SAF Builder so it can set a more useful description field:</li>
<li><p>Add the item&rsquo;s Type to the filename column as a hint to SAF Builder so it can set a more useful description field:</p> </ul>
<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value <pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
</code></pre></li> </code></pre><ul>
<li>Test importing of the new CIAT records (actually there are 232, not 234):</li>
<li><p>Test importing of the new CIAT records (actually there are 232, not 234):</p> </ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log <pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log
</code></pre></li> </code></pre><ul>
<li>Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB</li>
<li><p>Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB</p></li> <li>These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:</li>
</ul>
<li><p>These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:</p>
<pre><code>$ convert -compress Zip -density 150x150 input.pdf output.pdf <pre><code>$ convert -compress Zip -density 150x150 input.pdf output.pdf
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
</code></pre></li> </code></pre><ul>
<li>Somewhere on the Internet suggested using a DPI of 144</li>
<li><p>Somewhere on the Internet suggested using a DPI of 144</p></li>
</ul> </ul>
<h2 id="20170119">2017-01-19</h2>
<h2 id="2017-01-19">2017-01-19</h2>
<ul> <ul>
<li>In testing a random sample of CIAT&rsquo;s PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are</li> <li>In testing a random sample of CIAT's PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are</li>
<li>Import 232 CIAT records into CGSpace:</li>
<li><p>Import 232 CIAT records into CGSpace:</p> </ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/68704 --source /home/aorth/CIAT_232/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log <pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/68704 --source /home/aorth/CIAT_232/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log
</code></pre></li> </code></pre><h2 id="20170122">2017-01-22</h2>
</ul>
<h2 id="2017-01-22">2017-01-22</h2>
<ul> <ul>
<li>Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel&rsquo;s CSV exporter)</li> <li>Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel's CSV exporter)</li>
<li>There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full</li> <li>There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full</li>
</ul> </ul>
<h2 id="20170123">2017-01-23</h2>
<h2 id="2017-01-23">2017-01-23</h2>
<ul> <ul>
<li>I merged Atmire&rsquo;s pull request into the development branch so they can deploy it on DSpace Test</li> <li>I merged Atmire's pull request into the development branch so they can deploy it on DSpace Test</li>
<li>Move some old ILRI Program communities to a new subcommunity for former programs (10568/79164):</li>
<li><p>Move some old ILRI Program communities to a new subcommunity for former programs (<sup>10568</sup>&frasl;<sub>79164</sub>):</p> </ul>
<pre><code>$ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/230 10568/32724 10568/172; do /home/cgspace.cgiar.org/bin/dspace community-filiator --remove --parent=10568/27866 --child=&quot;$community&quot; &amp;&amp; /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/79164 --child=&quot;$community&quot;; done <pre><code>$ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/230 10568/32724 10568/172; do /home/cgspace.cgiar.org/bin/dspace community-filiator --remove --parent=10568/27866 --child=&quot;$community&quot; &amp;&amp; /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/79164 --child=&quot;$community&quot;; done
</code></pre></li> </code></pre><ul>
<li>Move some collections with <a href="https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515"><code>move-collections.sh</code></a> using the following config:</li>
<li><p>Move some collections with <a href="https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515"><code>move-collections.sh</code></a> using the following config:</p> </ul>
<pre><code>10568/42161 10568/171 10568/79341 <pre><code>10568/42161 10568/171 10568/79341
10568/41914 10568/171 10568/79340 10568/41914 10568/171 10568/79340
</code></pre></li> </code></pre><h2 id="20170124">2017-01-24</h2>
</ul>
<h2 id="2017-01-24">2017-01-24</h2>
<ul> <ul>
<li>Run all updates on DSpace Test and reboot the server</li> <li>Run all updates on DSpace Test and reboot the server</li>
<li>Run fixes for Journal titles on CGSpace:</li>
<li><p>Run fixes for Journal titles on CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
</code></pre></li>
<li><p>Create a new list of the top 500 journal titles from the database:</p>
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
</code></pre></li>
<li><p>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</p></li>
<li><p>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/69">#69</a>)</p></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
<h2 id="2017-01-25">2017-01-25</h2> </code></pre><ul>
<li>Create a new list of the top 500 journal titles from the database:</li>
</ul>
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
</code></pre><ul>
<li>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
<li>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/69">#69</a>)</li>
</ul>
<h2 id="20170125">2017-01-25</h2>
<ul> <ul>
<li>Atmire says the <code>com.atmire.statistics.util.UpdateSolrStorageReports</code> and <code>com.atmire.utils.ReportSender</code> are no longer necessary because they are using a Spring scheduler for these tasks now</li> <li>Atmire says the <code>com.atmire.statistics.util.UpdateSolrStorageReports</code> and <code>com.atmire.utils.ReportSender</code> are no longer necessary because they are using a Spring scheduler for these tasks now</li>
<li>Pull request to remove them from the Ansible templates: <a href="https://github.com/ilri/rmg-ansible-public/pull/80">https://github.com/ilri/rmg-ansible-public/pull/80</a></li> <li>Pull request to remove them from the Ansible templates: <a href="https://github.com/ilri/rmg-ansible-public/pull/80">https://github.com/ilri/rmg-ansible-public/pull/80</a></li>
<li>Still testing the Atmire modules on DSpace Test, and it looks like a few issues we had reported are now fixed: <li>Still testing the Atmire modules on DSpace Test, and it looks like a few issues we had reported are now fixed:
<ul> <ul>
<li>XLS Export from Content statistics</li> <li>XLS Export from Content statistics</li>
<li>Most popular items</li> <li>Most popular items</li>
<li>Show statistics on collection pages</li> <li>Show statistics on collection pages</li>
</ul></li> </ul>
</li>
<li>But now we have a new issue with the &ldquo;Types&rdquo; in Content statistics not being respected—we only get the defaults, despite having custom settings in <code>dspace/config/modules/atmire-cua.cfg</code></li> <li>But now we have a new issue with the &ldquo;Types&rdquo; in Content statistics not being respected—we only get the defaults, despite having custom settings in <code>dspace/config/modules/atmire-cua.cfg</code></li>
</ul> </ul>
<h2 id="20170127">2017-01-27</h2>
<h2 id="2017-01-27">2017-01-27</h2>
<ul> <ul>
<li>Magdalena pointed out that somehow the Anonymous group had been added to the Administrators group on CGSpace (!)</li> <li>Magdalena pointed out that somehow the Anonymous group had been added to the Administrators group on CGSpace (!)</li>
<li>Discuss plans to update CCAFS metadata and communities for their new flagships and phase II project identifiers</li> <li>Discuss plans to update CCAFS metadata and communities for their new flagships and phase II project identifiers</li>
<li>The flagships are in <code>cg.subject.ccafs</code>, and we need to probably make a new field for the phase II project identifiers</li> <li>The flagships are in <code>cg.subject.ccafs</code>, and we need to probably make a new field for the phase II project identifiers</li>
</ul> </ul>
<h2 id="20170128">2017-01-28</h2>
<h2 id="2017-01-28">2017-01-28</h2>
<ul> <ul>
<li>Merge controlled vocabulary for journal titles (<code>dc.source</code>) into CGSpace (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li> <li>Merge controlled vocabulary for journal titles (<code>dc.source</code>) into CGSpace (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
<li>Merge new CIAT subject into CGSpace (<a href="https://github.com/ilri/DSpace/pull/296">#296</a>)</li> <li>Merge new CIAT subject into CGSpace (<a href="https://github.com/ilri/DSpace/pull/296">#296</a>)</li>
</ul> </ul>
<h2 id="20170129">2017-01-29</h2>
<h2 id="2017-01-29">2017-01-29</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test, redeploy DSpace code, and reboot the server</li> <li>Run all system updates on DSpace Test, redeploy DSpace code, and reboot the server</li>
<li>Run all system updates on CGSpace, redeploy DSpace code, and reboot the server</li> <li>Run all system updates on CGSpace, redeploy DSpace code, and reboot the server</li>

View File

@ -8,23 +8,20 @@
<meta property="og:title" content="February, 2017" /> <meta property="og:title" content="February, 2017" />
<meta property="og:description" content="2017-02-07 <meta property="og:description" content="2017-02-07
An item was mapped twice erroneously again, so I had to remove one of the mappings manually: An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
dspace=# select * from collection2item where item_id = &#39;80278&#39;; dspace=# select * from collection2item where item_id = &#39;80278&#39;;
id | collection_id | item_id id | collection_id | item_id
-------&#43;---------------&#43;--------- -------&#43;---------------&#43;---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301) Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
Looks like we&#39;ll be using cg.identifier.ccafsprojectpii as the field name
Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-02/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-02/" />
@ -35,25 +32,22 @@ Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
<meta name="twitter:title" content="February, 2017"/> <meta name="twitter:title" content="February, 2017"/>
<meta name="twitter:description" content="2017-02-07 <meta name="twitter:description" content="2017-02-07
An item was mapped twice erroneously again, so I had to remove one of the mappings manually: An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
dspace=# select * from collection2item where item_id = &#39;80278&#39;; dspace=# select * from collection2item where item_id = &#39;80278&#39;;
id | collection_id | item_id id | collection_id | item_id
-------&#43;---------------&#43;--------- -------&#43;---------------&#43;---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301) Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
Looks like we&#39;ll be using cg.identifier.ccafsprojectpii as the field name
Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -134,258 +128,198 @@ Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
</p> </p>
</header> </header>
<h2 id="2017-02-07">2017-02-07</h2> <h2 id="20170207">2017-02-07</h2>
<ul> <ul>
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p> <li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278'; <pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li> <li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
</ul> </ul>
<h2 id="20170208">2017-02-08</h2>
<h2 id="2017-02-08">2017-02-08</h2>
<ul> <ul>
<li>We also need to rename some of the CCAFS Phase I flagships: <li>We also need to rename some of the CCAFS Phase I flagships:
<ul> <ul>
<li>CLIMATE-SMART AGRICULTURAL PRACTICESCLIMATE-SMART TECHNOLOGIES AND PRACTICES</li> <li>CLIMATE-SMART AGRICULTURAL PRACTICESCLIMATE-SMART TECHNOLOGIES AND PRACTICES</li>
<li>CLIMATE RISK MANAGEMENTCLIMATE SERVICES AND SAFETY NETS</li> <li>CLIMATE RISK MANAGEMENTCLIMATE SERVICES AND SAFETY NETS</li>
<li>LOW EMISSIONS AGRICULTURELOW EMISSIONS DEVELOPMENT</li> <li>LOW EMISSIONS AGRICULTURELOW EMISSIONS DEVELOPMENT</li>
<li>POLICIES AND INSTITUTIONSPRIORITIES AND POLICIES FOR CSA</li> <li>POLICIES AND INSTITUTIONSPRIORITIES AND POLICIES FOR CSA</li>
</ul></li>
<li>The climate risk management one doesn&rsquo;t exist, so I will have to ask Magdalena if they want me to add it to the input forms</li>
<li><p>Start testing some nearly 500 author corrections that CCAFS sent me:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
</code></pre></li>
</ul> </ul>
</li>
<h2 id="2017-02-09">2017-02-09</h2> <li>The climate risk management one doesn't exist, so I will have to ask Magdalena if they want me to add it to the input forms</li>
<li>Start testing some nearly 500 author corrections that CCAFS sent me:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
</code></pre><h2 id="20170209">2017-02-09</h2>
<ul> <ul>
<li>More work on CCAFS Phase II stuff</li> <li>More work on CCAFS Phase II stuff</li>
<li>Looks like simply adding a new metadata field to <code>dspace/config/registries/cgiar-types.xml</code> and restarting DSpace causes the field to get added to the rregistry</li> <li>Looks like simply adding a new metadata field to <code>dspace/config/registries/cgiar-types.xml</code> and restarting DSpace causes the field to get added to the rregistry</li>
<li>It requires a restart but at least it allows you to manage the registry programmatically</li> <li>It requires a restart but at least it allows you to manage the registry programmatically</li>
<li>It&rsquo;s not a very good way to manage the registry, though, as removing one there doesn&rsquo;t cause it to be removed from the registry, and we always restore from database backups so there would never be a scenario when we needed these to be created</li> <li>It's not a very good way to manage the registry, though, as removing one there doesn't cause it to be removed from the registry, and we always restore from database backups so there would never be a scenario when we needed these to be created</li>
<li>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</li>
<li><p>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</p>
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
</code></pre></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
<h2 id="2017-02-10">2017-02-10</h2> </code></pre><h2 id="20170210">2017-02-10</h2>
<ul> <ul>
<li>CCAFS said they want to wait on the flagship updates (<code>cg.subject.ccafs</code>) on CGSpace, perhaps for a month or so</li> <li>CCAFS said they want to wait on the flagship updates (<code>cg.subject.ccafs</code>) on CGSpace, perhaps for a month or so</li>
<li>Help Marianne Gadeberg (WLE) with some user permissions as it seems she had previously been using a personal email account, and is now on a CGIAR one</li> <li>Help Marianne Gadeberg (WLE) with some user permissions as it seems she had previously been using a personal email account, and is now on a CGIAR one</li>
<li>I manually added her new account to ~25 authorizations that her hold user was on</li> <li>I manually added her new account to ~25 authorizations that her hold user was on</li>
</ul> </ul>
<h2 id="20170214">2017-02-14</h2>
<h2 id="2017-02-14">2017-02-14</h2>
<ul> <ul>
<li>Add <code>SCALING</code> to ILRI subjects (<a href="https://github.com/ilri/DSpace/pull/304">#304</a>), as Sisay&rsquo;s attempts were all sloppy</li> <li>Add <code>SCALING</code> to ILRI subjects (<a href="https://github.com/ilri/DSpace/pull/304">#304</a>), as Sisay's attempts were all sloppy</li>
<li>Cherry pick some patches from the DSpace 5.7 branch: <li>Cherry pick some patches from the DSpace 5.7 branch:
<ul> <ul>
<li>DS-3363 CSV import error says &ldquo;row&rdquo;, means &ldquo;column&rdquo;: f7b6c83e991db099003ee4e28ca33d3c7bab48c0</li> <li>DS-3363 CSV import error says &ldquo;row&rdquo;, means &ldquo;column&rdquo;: f7b6c83e991db099003ee4e28ca33d3c7bab48c0</li>
<li>DS-3479 avoid adding empty metadata values during import: 329f3b48a6de7fad074d825fd12118f7e181e151</li> <li>DS-3479 avoid adding empty metadata values during import: 329f3b48a6de7fad074d825fd12118f7e181e151</li>
<li>[DS-3456] 5x Clarify command line options for statisics import/export tools (#1623): 567ec083c8a94eb2bcc1189816eb4f767745b278</li> <li>[DS-3456] 5x Clarify command line options for statisics import/export tools (#1623): 567ec083c8a94eb2bcc1189816eb4f767745b278</li>
<li>[DS-3458]5x Allow Shard Process to Append to an existing repo: 3c8ecb5d1fd69a1dcfee01feed259e80abbb7749</li> <li>[DS-3458]5x Allow Shard Process to Append to an existing repo: 3c8ecb5d1fd69a1dcfee01feed259e80abbb7749</li>
</ul></li> </ul>
</li>
<li>I still need to test these, especially as the last two which change some stuff with Solr maintenance</li> <li>I still need to test these, especially as the last two which change some stuff with Solr maintenance</li>
</ul> </ul>
<h2 id="20170215">2017-02-15</h2>
<h2 id="2017-02-15">2017-02-15</h2>
<ul> <ul>
<li>Update rvm on DSpace Test and CGSpace as there was a <a href="https://github.com/justinsteven/advisories/blob/master/2017_rvm_cd_command_execution.md">security disclosure about versions less than 1.28.0</a></li> <li>Update rvm on DSpace Test and CGSpace as there was a <a href="https://github.com/justinsteven/advisories/blob/master/2017_rvm_cd_command_execution.md">security disclosure about versions less than 1.28.0</a></li>
</ul> </ul>
<h2 id="20170216">2017-02-16</h2>
<h2 id="2017-02-16">2017-02-16</h2>
<ul> <ul>
<li>Looking at memory info from munin on CGSpace:</li> <li>Looking at memory info from munin on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/02/meminfo_phisical-week.png" alt="CGSpace meminfo"></p>
<p><img src="/cgspace-notes/2017/02/meminfo_phisical-week.png" alt="CGSpace meminfo" /></p>
<ul> <ul>
<li>We are using only ~8GB of RAM for applications, and 16GB for caches!</li> <li>We are using only ~8GB of RAM for applications, and 16GB for caches!</li>
<li>The Linode machine we&rsquo;re on has 24GB of RAM but only because that&rsquo;s the only instance that had enough disk space for us (384GB)&hellip;</li> <li>The Linode machine we're on has 24GB of RAM but only because that's the only instance that had enough disk space for us (384GB)&hellip;</li>
<li>We should probably look into Google Compute Engine or Digital Ocean where we can get more storage without having to follow a linear increase in instance pricing for CPU/memory as well</li> <li>We should probably look into Google Compute Engine or Digital Ocean where we can get more storage without having to follow a linear increase in instance pricing for CPU/memory as well</li>
<li>Especially because we only use 2 out of 8 CPUs basically:</li> <li>Especially because we only use 2 out of 8 CPUs basically:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/02/cpu-week.png" alt="CGSpace CPU"></p>
<p><img src="/cgspace-notes/2017/02/cpu-week.png" alt="CGSpace CPU" /></p>
<ul> <ul>
<li>Fix issue with duplicate declaration of in atmire-dspace-xmlui <code>pom.xml</code> (causing non-fatal warnings during the maven build)</li> <li>Fix issue with duplicate declaration of in atmire-dspace-xmlui <code>pom.xml</code> (causing non-fatal warnings during the maven build)</li>
<li>Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site's properties file:</li>
<li><p>Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site&rsquo;s properties file:</p> </ul>
<pre><code>handle.canonical.prefix = https://hdl.handle.net/ <pre><code>handle.canonical.prefix = https://hdl.handle.net/
</code></pre></li> </code></pre><ul>
<li>And then a SQL command to update existing records:</li>
<li><p>And then a SQL command to update existing records:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri'); <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri');
UPDATE 58193 UPDATE 58193
</code></pre></li> </code></pre><ul>
<li>Seems to work fine!</li>
<li><p>Seems to work fine!</p></li> <li>I noticed a few items that have incorrect DOI links (<code>dc.identifier.doi</code>), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:</li>
</ul>
<li><p>I noticed a few items that have incorrect DOI links (<code>dc.identifier.doi</code>), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:</p>
<pre><code>dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%'; <pre><code>dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%';
</code></pre></li> </code></pre><ul>
<li>This will replace any that begin with <code>10.</code> and change them to <code>https://dx.doi.org/10.</code>:</li>
<li><p>This will replace any that begin with <code>10.</code> and change them to <code>https://dx.doi.org/10.</code>:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%'; <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%';
</code></pre></li> </code></pre><ul>
<li>This will get any that begin with <code>doi:10.</code> and change them to <code>https://dx.doi.org/10.x</code>:</li>
<li><p>This will get any that begin with <code>doi:10.</code> and change them to <code>https://dx.doi.org/10.x</code>:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%'; <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%';
</code></pre></li> </code></pre><ul>
<li>Fix DOIs like <code>dx.doi.org/10.</code> to be <code>https://dx.doi.org/10.</code>:</li>
<li><p>Fix DOIs like <code>dx.doi.org/10.</code> to be <code>https://dx.doi.org/10.</code>:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%'; <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%';
</code></pre></li> </code></pre><ul>
<li>Fix DOIs like <code>http//</code>:</li>
<li><p>Fix DOIs like <code>http//</code>:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%'; <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%';
</code></pre></li> </code></pre><ul>
<li>Fix DOIs like <code>dx.doi.org./</code>:</li>
<li><p>Fix DOIs like <code>dx.doi.org./</code>:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%' <pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%'
</code></pre></li> </code></pre><ul>
<li>Delete some invalid DOIs:</li>
<li><p>Delete some invalid DOIs:</p> </ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb'); <pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb');
</code></pre></li> </code></pre><ul>
<li>Fix some other random outliers:</li>
<li><p>Fix some other random outliers:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003'; <pre><code>dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.5337/2016.200' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'doi: https://dx.doi.org/10.5337/2016.200'; dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.5337/2016.200' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'doi: https://dx.doi.org/10.5337/2016.200';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/doi:10.1371/journal.pone.0062898' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Http://dx.doi.org/doi:10.1371/journal.pone.0062898'; dspace=# update metadatavalue set text_value = 'https://dx.doi.org/doi:10.1371/journal.pone.0062898' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Http://dx.doi.org/doi:10.1371/journal.pone.0062898';
dspace=# update metadatavalue set text_value = 'https://dx.doi.10.1016/j.cosust.2013.11.012' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:dx.doi.10.1016/j.cosust.2013.11.012'; dspace=# update metadatavalue set text_value = 'https://dx.doi.10.1016/j.cosust.2013.11.012' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:dx.doi.10.1016/j.cosust.2013.11.012';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1080/03632415.2014.883570' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'org/10.1080/03632415.2014.883570'; dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1080/03632415.2014.883570' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'org/10.1080/03632415.2014.883570';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agron.colomb.v32n3.46052' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Doi: 10.15446/agron.colomb.v32n3.46052'; dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agron.colomb.v32n3.46052' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Doi: 10.15446/agron.colomb.v32n3.46052';
</code></pre></li> </code></pre><ul>
<li>And do another round of <code>http://</code> → <code>https://</code> cleanups:</li>
<li><p>And do another round of <code>http://</code> → <code>https://</code> cleanups:</p>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
</code></pre></li>
<li><p>Run all DOI corrections on CGSpace</p></li>
<li><p>Something to think about here is to write a <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System#CurationSystem-ScriptedTasks">Curation Task</a> in Java to do these sanity checks / corrections every night</p></li>
<li><p>Then we could add a cron job for them and run them from the command line like:</p>
<pre><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
</code></pre></li>
</ul> </ul>
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
<h2 id="2017-02-20">2017-02-20</h2> </code></pre><ul>
<li>Run all DOI corrections on CGSpace</li>
<li>Something to think about here is to write a <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System#CurationSystem-ScriptedTasks">Curation Task</a> in Java to do these sanity checks / corrections every night</li>
<li>Then we could add a cron job for them and run them from the command line like:</li>
</ul>
<pre><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
</code></pre><h2 id="20170220">2017-02-20</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test and reboot the server</li> <li>Run all system updates on DSpace Test and reboot the server</li>
<li>Run CCAFS author corrections on DSpace Test and CGSpace and force a full discovery reindex</li> <li>Run CCAFS author corrections on DSpace Test and CGSpace and force a full discovery reindex</li>
<li>Fix label of CCAFS subjects in Atmire Listings and Reports module</li> <li>Fix label of CCAFS subjects in Atmire Listings and Reports module</li>
<li>Help Sisay with SQL commands</li> <li>Help Sisay with SQL commands</li>
<li>Help Paola from CCAFS with the Atmire Listings and Reports module</li> <li>Help Paola from CCAFS with the Atmire Listings and Reports module</li>
<li>Testing the <code>fix-metadata-values.py</code> script on macOS and it seems like we don&rsquo;t need to use <code>.encode('utf-8')</code> anymore when printing strings to the screen</li> <li>Testing the <code>fix-metadata-values.py</code> script on macOS and it seems like we don't need to use <code>.encode('utf-8')</code> anymore when printing strings to the screen</li>
<li>It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string &ldquo;Entwicklung &amp; Ländlicher Raum&rdquo; without the <code>encode()</code> call, but print it as a bytes when it <em>is</em> used:</li>
<li><p>It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string &ldquo;Entwicklung &amp; Ländlicher Raum&rdquo; without the <code>encode()</code> call, but print it as a bytes when it <em>is</em> used:</p> </ul>
<pre><code>$ python <pre><code>$ python
Python 3.6.0 (default, Dec 25 2016, 17:30:53) Python 3.6.0 (default, Dec 25 2016, 17:30:53)
&gt;&gt;&gt; print('Entwicklung &amp; Ländlicher Raum') &gt;&gt;&gt; print('Entwicklung &amp; Ländlicher Raum')
Entwicklung &amp; Ländlicher Raum Entwicklung &amp; Ländlicher Raum
&gt;&gt;&gt; print('Entwicklung &amp; Ländlicher Raum'.encode()) &gt;&gt;&gt; print('Entwicklung &amp; Ländlicher Raum'.encode())
b'Entwicklung &amp; L\xc3\xa4ndlicher Raum' b'Entwicklung &amp; L\xc3\xa4ndlicher Raum'
</code></pre></li> </code></pre><ul>
<li>So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really <em>was</em> a temporary problem, perhaps due to macOS or the Python build I was using.</li>
<li><p>So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really <em>was</em> a temporary problem, perhaps due to macOS or the Python build I was using.</p></li>
</ul> </ul>
<h2 id="20170221">2017-02-21</h2>
<h2 id="2017-02-21">2017-02-21</h2>
<ul> <ul>
<li>Testing regenerating PDF thumbnails, like I started in 2016-11</li> <li>Testing regenerating PDF thumbnails, like I started in 2016-11</li>
<li>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren't part of its configuration:</li>
<li><p>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren&rsquo;t part of its configuration:</p> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p &quot;ImageMagick PDF Thumbnail&quot; <pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p &quot;ImageMagick PDF Thumbnail&quot;
File: earlywinproposal_esa_postharvest.pdf.jpg File: earlywinproposal_esa_postharvest.pdf.jpg
FILTERED: bitstream 13787 (item: 10568/16881) and created 'earlywinproposal_esa_postharvest.pdf.jpg' FILTERED: bitstream 13787 (item: 10568/16881) and created 'earlywinproposal_esa_postharvest.pdf.jpg'
File: postHarvest.jpg.jpg File: postHarvest.jpg.jpg
FILTERED: bitstream 16524 (item: 10568/24655) and created 'postHarvest.jpg.jpg' FILTERED: bitstream 16524 (item: 10568/24655) and created 'postHarvest.jpg.jpg'
</code></pre></li> </code></pre><ul>
<li>According to <code>dspace.cfg</code> the ImageMagick PDF Thumbnail plugin should only process PDFs:</li>
<li><p>According to <code>dspace.cfg</code> the ImageMagick PDF Thumbnail plugin should only process PDFs:</p> </ul>
<pre><code>filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000 <pre><code>filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF
</code></pre></li> </code></pre><ul>
<li>I've sent a message to the mailing list and might file a Jira issue</li>
<li><p>I&rsquo;ve sent a message to the mailing list and might file a Jira issue</p></li> <li>Ask Atmire about the failed interpolation of the <code>dspace.internalUrl</code> variable in <code>atmire-cua.cfg</code></li>
<li><p>Ask Atmire about the failed interpolation of the <code>dspace.internalUrl</code> variable in <code>atmire-cua.cfg</code></p></li>
</ul> </ul>
<h2 id="20170222">2017-02-22</h2>
<h2 id="2017-02-22">2017-02-22</h2>
<ul> <ul>
<li>Atmire said I can add <code>dspace.internalUrl</code> to my build properties and the error will go away</li> <li>Atmire said I can add <code>dspace.internalUrl</code> to my build properties and the error will go away</li>
<li>It should be the local URL for accessing Tomcat from the server&rsquo;s own perspective, ie: <a href="http://localhost:8080">http://localhost:8080</a></li> <li>It should be the local URL for accessing Tomcat from the server's own perspective, ie: http://localhost:8080</li>
</ul> </ul>
<h2 id="20170226">2017-02-26</h2>
<h2 id="2017-02-26">2017-02-26</h2>
<ul> <ul>
<li><p>Find all fields with &ldquo;<a href="http://hdl.handle.net&quot;">http://hdl.handle.net&quot;</a> values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</p> <li>Find all fields with &ldquo;<a href="http://hdl.handle.net">http://hdl.handle.net</a>&rdquo; values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</li>
</ul>
<pre><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%'; <pre><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
UPDATE 58633 UPDATE 58633
</code></pre></li> </code></pre><ul>
<li>This works but I'm thinking I'll wait on the replacement as there are perhaps some other places that rely on <code>http://hdl.handle.net</code> (grep the code, it's scary how many things are hard coded)</li>
<li><p>This works but I&rsquo;m thinking I&rsquo;ll wait on the replacement as there are perhaps some other places that rely on <code>http://hdl.handle.net</code> (grep the code, it&rsquo;s scary how many things are hard coded)</p></li> <li>Send message to dspace-tech mailing list with concerns about this</li>
<li><p>Send message to dspace-tech mailing list with concerns about this</p></li>
</ul> </ul>
<h2 id="20170227">2017-02-27</h2>
<h2 id="2017-02-27">2017-02-27</h2>
<ul> <ul>
<li><p>LDAP users cannot log in today, looks to be an issue with CGIAR&rsquo;s LDAP server:</p> <li>LDAP users cannot log in today, looks to be an issue with CGIAR's LDAP server:</li>
</ul>
<pre><code>$ openssl s_client -connect svcgroot2.cgiarad.org:3269 <pre><code>$ openssl s_client -connect svcgroot2.cgiarad.org:3269
CONNECTED(00000003) CONNECTED(00000003)
depth=0 CN = SVCGROOT2.CGIARAD.ORG depth=0 CN = SVCGROOT2.CGIARAD.ORG
@ -396,15 +330,13 @@ verify error:num=21:unable to verify the first certificate
verify return:1 verify return:1
--- ---
Certificate chain Certificate chain
0 s:/CN=SVCGROOT2.CGIARAD.ORG 0 s:/CN=SVCGROOT2.CGIARAD.ORG
i:/CN=CGIARAD-RDWA-CA i:/CN=CGIARAD-RDWA-CA
--- ---
</code></pre></li> </code></pre><ul>
<li>For some reason it is now signed by a private certificate authority</li>
<li><p>For some reason it is now signed by a private certificate authority</p></li> <li>This error seems to have started on 2017-02-25:</li>
</ul>
<li><p>This error seems to have started on 2017-02-25:</p>
<pre><code>$ grep -c &quot;unable to find valid certification path&quot; [dspace]/log/dspace.log.2017-02-* <pre><code>$ grep -c &quot;unable to find valid certification path&quot; [dspace]/log/dspace.log.2017-02-*
[dspace]/log/dspace.log.2017-02-01:0 [dspace]/log/dspace.log.2017-02-01:0
[dspace]/log/dspace.log.2017-02-02:0 [dspace]/log/dspace.log.2017-02-02:0
@ -433,52 +365,37 @@ i:/CN=CGIARAD-RDWA-CA
[dspace]/log/dspace.log.2017-02-25:7 [dspace]/log/dspace.log.2017-02-25:7
[dspace]/log/dspace.log.2017-02-26:8 [dspace]/log/dspace.log.2017-02-26:8
[dspace]/log/dspace.log.2017-02-27:90 [dspace]/log/dspace.log.2017-02-27:90
</code></pre></li> </code></pre><ul>
<li>Also, it seems that we need to use a different user for LDAP binds, as we're still using the temporary one from the root migration, so maybe we can go back to the previous user we were using</li>
<li><p>Also, it seems that we need to use a different user for LDAP binds, as we&rsquo;re still using the temporary one from the root migration, so maybe we can go back to the previous user we were using</p></li> <li>So it looks like the certificate is invalid AND the bind users we had been using were deleted</li>
<li>Biruk Debebe recreated the bind user and now we are just waiting for CGNET to update their certificates</li>
<li><p>So it looks like the certificate is invalid AND the bind users we had been using were deleted</p></li> <li>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the &ldquo;Content Files&rdquo; (aka <code>ORIGINAL</code>) bundle</li>
<li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li>
<li><p>Biruk Debebe recreated the bind user and now we are just waiting for CGNET to update their certificates</p></li> <li>Run CIAT corrections on CGSpace</li>
<li><p>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the &ldquo;Content Files&rdquo; (aka <code>ORIGINAL</code>) bundle</p></li>
<li><p>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</p></li>
<li><p>Run CIAT corrections on CGSpace</p>
<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
</code></pre></li>
<li><p>CGNET has fixed the certificate chain on their LDAP server</p></li>
<li><p>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</p></li>
<li><p>Run all system updates on CGSpace server and reboot</p></li>
</ul> </ul>
<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
<h2 id="2017-02-28">2017-02-28</h2> </code></pre><ul>
<li>CGNET has fixed the certificate chain on their LDAP server</li>
<li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li>
<li>Run all system updates on CGSpace server and reboot</li>
</ul>
<h2 id="20170228">2017-02-28</h2>
<ul> <ul>
<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li> <li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li>
<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn&rsquo;t figure out how to fix</li> <li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn't figure out how to fix</li>
<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li>
<li><p>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></p> </ul>
<pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv; <pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
COPY 1968 COPY 1968
</code></pre></li> </code></pre><ul>
<li>And then use awk to print the duplicate lines to a separate file:</li>
<li><p>And then use awk to print the duplicate lines to a separate file:</p>
<pre><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv &gt; /tmp/ciat-dupes.csv
</code></pre></li>
<li><p>From that file I can create a list of 279 deletes and put them in a batch script like:</p>
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
</code></pre></li>
</ul> </ul>
<pre><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv &gt; /tmp/ciat-dupes.csv
</code></pre><ul>
<li>From that file I can create a list of 279 deletes and put them in a batch script like:</li>
</ul>
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
</code></pre>

View File

@ -8,13 +8,10 @@
<meta property="og:title" content="March, 2017" /> <meta property="og:title" content="March, 2017" />
<meta property="og:description" content="2017-03-01 <meta property="og:description" content="2017-03-01
Run the 279 CIAT author corrections on CGSpace Run the 279 CIAT author corrections on CGSpace
2017-03-02 2017-03-02
Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities
@ -23,12 +20,10 @@ Need to send Peter and Michael some notes about this in a few days
Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516 Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
Interestingly, it seems DSpace 4.x&#39;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&#39;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568/51999):
Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568&frasl;51999):
$ identify ~/Desktop/alc_contrastes_desafios.jpg $ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600&#43;0&#43;0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600&#43;0&#43;0 8-bit CMYK 168KB 0.000u 0:00.000
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-03/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-03/" />
@ -39,13 +34,10 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
<meta name="twitter:title" content="March, 2017"/> <meta name="twitter:title" content="March, 2017"/>
<meta name="twitter:description" content="2017-03-01 <meta name="twitter:description" content="2017-03-01
Run the 279 CIAT author corrections on CGSpace Run the 279 CIAT author corrections on CGSpace
2017-03-02 2017-03-02
Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities
@ -54,14 +46,12 @@ Need to send Peter and Michael some notes about this in a few days
Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516 Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
Interestingly, it seems DSpace 4.x&#39;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&#39;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568/51999):
Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568&frasl;51999):
$ identify ~/Desktop/alc_contrastes_desafios.jpg $ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600&#43;0&#43;0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600&#43;0&#43;0 8-bit CMYK 168KB 0.000u 0:00.000
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -142,14 +132,11 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
</p> </p>
</header> </header>
<h2 id="2017-03-01">2017-03-01</h2> <h2 id="20170301">2017-03-01</h2>
<ul> <ul>
<li>Run the 279 CIAT author corrections on CGSpace</li> <li>Run the 279 CIAT author corrections on CGSpace</li>
</ul> </ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul> <ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> <li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> <li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -159,189 +146,132 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p> </ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre></li> </code></pre><ul>
</ul>
<ul>
<li>This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:</li> <li>This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/03/thumbnail-srgb.jpg" alt="Thumbnail in sRGB colorspace"></p>
<p><img src="/cgspace-notes/2017/03/thumbnail-srgb.jpg" alt="Thumbnail in sRGB colorspace" /></p> <p><img src="/cgspace-notes/2017/03/thumbnail-cmyk.jpg" alt="Thumbnial in CMYK colorspace"></p>
<p><img src="/cgspace-notes/2017/03/thumbnail-cmyk.jpg" alt="Thumbnial in CMYK colorspace" /></p>
<ul> <ul>
<li>I filed an issue for the color space thing: <a href="https://jira.duraspace.org/browse/DS-3517">DS-3517</a></li> <li>I filed an issue for the color space thing: <a href="https://jira.duraspace.org/browse/DS-3517">DS-3517</a></li>
</ul> </ul>
<h2 id="20170303">2017-03-03</h2>
<h2 id="2017-03-03">2017-03-03</h2>
<ul> <ul>
<li>I created a patch for DS-3517 and made a pull request against upstream <code>dspace-5_x</code>: <a href="https://github.com/DSpace/DSpace/pull/1669">https://github.com/DSpace/DSpace/pull/1669</a></li> <li>I created a patch for DS-3517 and made a pull request against upstream <code>dspace-5_x</code>: <a href="https://github.com/DSpace/DSpace/pull/1669">https://github.com/DSpace/DSpace/pull/1669</a></li>
<li>Looks like <code>-colorspace sRGB</code> alone isn't enough, we need to use profiles:</li>
<li><p>Looks like <code>-colorspace sRGB</code> alone isn&rsquo;t enough, we need to use profiles:</p> </ul>
<pre><code>$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg <pre><code>$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg
</code></pre></li> </code></pre><ul>
<li>This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file</li>
<li><p>This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file</p></li> <li>Note that you should set the first profile immediately after the input file</li>
<li>Also, it is better to use profiles than setting <code>-colorspace</code></li>
<li><p>Note that you should set the first profile immediately after the input file</p></li> <li>This is a great resource describing the color stuff: <a href="http://www.imagemagick.org/Usage/formats/#profiles">http://www.imagemagick.org/Usage/formats/#profiles</a></li>
<li>Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)</li>
<li><p>Also, it is better to use profiles than setting <code>-colorspace</code></p></li> <li>This is trivial with <code>identify</code> (even by the <a href="http://im4java.sourceforge.net/api/org/im4java/core/IMOps.html#identify">Java ImageMagick API</a>):</li>
</ul>
<li><p>This is a great resource describing the color stuff: <a href="http://www.imagemagick.org/Usage/formats/#profiles">http://www.imagemagick.org/Usage/formats/#profiles</a></p></li>
<li><p>Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)</p></li>
<li><p>This is trivial with <code>identify</code> (even by the <a href="http://im4java.sourceforge.net/api/org/im4java/core/IMOps.html#identify">Java ImageMagick API</a>):</p>
<pre><code>$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\] <pre><code>$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\]
DirectClass CMYK DirectClass CMYK
$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\] $ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
DirectClass sRGB Alpha DirectClass sRGB Alpha
</code></pre></li> </code></pre><h2 id="20170304">2017-03-04</h2>
</ul>
<h2 id="2017-03-04">2017-03-04</h2>
<ul> <ul>
<li>Spent more time looking at the ImageMagick CMYK issue</li> <li>Spent more time looking at the ImageMagick CMYK issue</li>
<li>The <code>default_cmyk.icc</code> and <code>default_rgb.icc</code> files are both part of the Ghostscript GPL distribution, but according to DSpace&rsquo;s <code>LICENSES_THIRD_PARTY</code> file, DSpace doesn&rsquo;t allow distribution of dependencies that are licensed solely under the GPL</li> <li>The <code>default_cmyk.icc</code> and <code>default_rgb.icc</code> files are both part of the Ghostscript GPL distribution, but according to DSpace's <code>LICENSES_THIRD_PARTY</code> file, DSpace doesn't allow distribution of dependencies that are licensed solely under the GPL</li>
<li>So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion</li> <li>So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion</li>
</ul> </ul>
<h2 id="20170305">2017-03-05</h2>
<h2 id="2017-03-05">2017-03-05</h2>
<ul> <ul>
<li>Look into helping developers from landportal.info with a query for items related to LAND on the REST API</li> <li>Look into helping developers from landportal.info with a query for items related to LAND on the REST API</li>
<li>They want something like the items that are returned by the general &ldquo;LAND&rdquo; query in the search interface, but we cannot do that</li> <li>They want something like the items that are returned by the general &ldquo;LAND&rdquo; query in the search interface, but we cannot do that</li>
<li>We can only return specific results for metadata fields, like:</li>
<li><p>We can only return specific results for metadata fields, like:</p> </ul>
<pre><code>$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;https://dspacetest.cgiar.org/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;LAND REFORM&quot;, &quot;language&quot;: null}' | json_pp <pre><code>$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;https://dspacetest.cgiar.org/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;LAND REFORM&quot;, &quot;language&quot;: null}' | json_pp
</code></pre></li> </code></pre><ul>
<li>But there are hundreds of combinations of fields and values (like <code>dc.subject</code> and all the center subjects), and we can't use wildcards in REST!</li>
<li><p>But there are hundreds of combinations of fields and values (like <code>dc.subject</code> and all the center subjects), and we can&rsquo;t use wildcards in REST!</p></li> <li>Reading about enabling multiple handle prefixes in DSpace</li>
<li>There is a mailing list thread from 2011 about it: <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html</a></li>
<li><p>Reading about enabling multiple handle prefixes in DSpace</p></li> <li>And a comment from Atmire's Bram about it on the DSpace wiki: <a href="https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296">https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296</a></li>
<li>Bram mentions an undocumented configuration option <code>handle.plugin.checknameauthority</code>, but I noticed another one in <code>dspace.cfg</code>:</li>
<li><p>There is a mailing list thread from 2011 about it: <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html</a></p></li> </ul>
<li><p>And a comment from Atmire&rsquo;s Bram about it on the DSpace wiki: <a href="https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296">https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296</a></p></li>
<li><p>Bram mentions an undocumented configuration option <code>handle.plugin.checknameauthority</code>, but I noticed another one in <code>dspace.cfg</code>:</p>
<pre><code># List any additional prefixes that need to be managed by this handle server <pre><code># List any additional prefixes that need to be managed by this handle server
# (as for examle handle prefix coming from old dspace repository merged in # (as for examle handle prefix coming from old dspace repository merged in
# that repository) # that repository)
# handle.additional.prefixes = prefix1[, prefix2] # handle.additional.prefixes = prefix1[, prefix2]
</code></pre></li> </code></pre><ul>
<li>Because of this I noticed that our Handle server's <code>config.dct</code> was potentially misconfigured!</li>
<li><p>Because of this I noticed that our Handle server&rsquo;s <code>config.dct</code> was potentially misconfigured!</p></li> <li>We had some default values still present:</li>
</ul>
<li><p>We had some default values still present:</p>
<pre><code>&quot;300:0.NA/YOUR_NAMING_AUTHORITY&quot; <pre><code>&quot;300:0.NA/YOUR_NAMING_AUTHORITY&quot;
</code></pre></li> </code></pre><ul>
<li>I've changed them to the following and restarted the handle server:</li>
<li><p>I&rsquo;ve changed them to the following and restarted the handle server:</p> </ul>
<pre><code>&quot;300:0.NA/10568&quot; <pre><code>&quot;300:0.NA/10568&quot;
</code></pre></li> </code></pre><ul>
<li>In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk</li>
<li><p>In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk</p></li> <li>From <code>dspace/config/crosswalks/google-metadata.properties</code>:</li>
</ul>
<li><p>From <code>dspace/config/crosswalks/google-metadata.properties</code>:</p>
<pre><code>google.citation_doi = cg.identifier.doi <pre><code>google.citation_doi = cg.identifier.doi
</code></pre></li> </code></pre><ul>
<li>This works, and makes DSpace output the following metadata on the item view page:</li>
<li><p>This works, and makes DSpace output the following metadata on the item view page:</p> </ul>
<pre><code>&lt;meta content=&quot;https://dx.doi.org/10.1186/s13059-017-1153-y&quot; name=&quot;citation_doi&quot;&gt; <pre><code>&lt;meta content=&quot;https://dx.doi.org/10.1186/s13059-017-1153-y&quot; name=&quot;citation_doi&quot;&gt;
</code></pre></li> </code></pre><ul>
<li>Submitted and merged pull request for this: <a href="https://github.com/ilri/DSpace/pull/305">https://github.com/ilri/DSpace/pull/305</a></li>
<li><p>Submitted and merged pull request for this: <a href="https://github.com/ilri/DSpace/pull/305">https://github.com/ilri/DSpace/pull/305</a></p></li> <li>Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of &ldquo;,&quot;: <a href="https://github.com/ilri/DSpace/pull/306">https://github.com/ilri/DSpace/pull/306</a></li>
<li>I want to show it briefly to Abenet and Peter to get feedback</li>
<li><p>Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of &ldquo;,&rdquo;: <a href="https://github.com/ilri/DSpace/pull/306">https://github.com/ilri/DSpace/pull/306</a></p></li>
<li><p>I want to show it briefly to Abenet and Peter to get feedback</p></li>
</ul> </ul>
<h2 id="20170306">2017-03-06</h2>
<h2 id="2017-03-06">2017-03-06</h2>
<ul> <ul>
<li>Someone on the mailing list said that <code>handle.plugin.checknameauthority</code> should be false if we&rsquo;re using multiple handle prefixes</li> <li>Someone on the mailing list said that <code>handle.plugin.checknameauthority</code> should be false if we're using multiple handle prefixes</li>
</ul> </ul>
<h2 id="20170307">2017-03-07</h2>
<h2 id="2017-03-07">2017-03-07</h2>
<ul> <ul>
<li>I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix</li> <li>I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix</li>
<li>When testing the Handle resolver locally it shows the item to be on the local repository</li> <li>When testing the Handle resolver locally it shows the item to be on the local repository</li>
<li>So this seems to work, with the following caveats: <li>So this seems to work, with the following caveats:
<ul> <ul>
<li>New items will have the default handle</li> <li>New items will have the default handle</li>
<li>Communities and collections will have the default handle</li> <li>Communities and collections will have the default handle</li>
<li>Only items imported manually can have the other handles</li> <li>Only items imported manually can have the other handles</li>
</ul></li> </ul>
</li>
<li>I need to talk to Michael and Peter to share the news, and discuss the structure of their community(s) and try some actual test data</li> <li>I need to talk to Michael and Peter to share the news, and discuss the structure of their community(s) and try some actual test data</li>
<li>We&rsquo;ll need to do some data cleaning to make sure they are using the same fields we are, like <code>dc.type</code> and <code>cg.identifier.status</code></li> <li>We'll need to do some data cleaning to make sure they are using the same fields we are, like <code>dc.type</code> and <code>cg.identifier.status</code></li>
<li>Another thing is that the import process creates new <code>dc.date.accessioned</code> and <code>dc.date.available</code> fields, so we end up with duplicates (is it important to preserve the originals for these?)</li> <li>Another thing is that the import process creates new <code>dc.date.accessioned</code> and <code>dc.date.available</code> fields, so we end up with duplicates (is it important to preserve the originals for these?)</li>
<li>Report DS-3520 issue to Atmire</li> <li>Report DS-3520 issue to Atmire</li>
</ul> </ul>
<h2 id="20170308">2017-03-08</h2>
<h2 id="2017-03-08">2017-03-08</h2>
<ul> <ul>
<li>Merge the author separator changes to <code>5_x-prod</code>, as everyone has responded positively about it, and it&rsquo;s the default in Mirage2 afterall!</li> <li>Merge the author separator changes to <code>5_x-prod</code>, as everyone has responded positively about it, and it's the default in Mirage2 afterall!</li>
<li>Cherry pick the <code>commons-collections</code> patch from DSpace&rsquo;s <code>dspace-5_x</code> branch to address DS-3520: <a href="https://jira.duraspace.org/browse/DS-3520">https://jira.duraspace.org/browse/DS-3520</a></li> <li>Cherry pick the <code>commons-collections</code> patch from DSpace's <code>dspace-5_x</code> branch to address DS-3520: <a href="https://jira.duraspace.org/browse/DS-3520">https://jira.duraspace.org/browse/DS-3520</a></li>
</ul> </ul>
<h2 id="20170309">2017-03-09</h2>
<h2 id="2017-03-09">2017-03-09</h2>
<ul> <ul>
<li><p>Export list of sponsors so Peter can clean it up:</p> <li>Export list of sponsors so Peter can clean it up:</li>
</ul>
<pre><code>dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv; <pre><code>dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
COPY 285 COPY 285
</code></pre></li> </code></pre><h2 id="20170312">2017-03-12</h2>
</ul>
<h2 id="2017-03-12">2017-03-12</h2>
<ul> <ul>
<li><p>Test the sponsorship fixes and deletes from Peter:</p> <li>Test the sponsorship fixes and deletes from Peter:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i Investors-Fix-51.csv -f dc.description.sponsorship -t Action -m 29 -d dspace -u dspace -p fuuuu <pre><code>$ ./fix-metadata-values.py -i Investors-Fix-51.csv -f dc.description.sponsorship -t Action -m 29 -d dspace -u dspace -p fuuuu
$ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
</code></pre></li> </code></pre><ul>
<li>Generate a new list of unique sponsors so we can update the controlled vocabulary:</li>
<li><p>Generate a new list of unique sponsors so we can update the controlled vocabulary:</p>
<pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship')) to /tmp/sponsorship.csv with csv;
</code></pre></li>
<li><p>Pull request for controlled vocabulary if Peter approves: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></p></li>
<li><p>Review Sisay&rsquo;s roots, tubers, and bananas (RTB) theme, which still needs some fixes to work properly: <a href="https://github.com/ilri/DSpace/pull/307">https://github.com/ilri/DSpace/pull/307</a></p></li>
<li><p>Created an issue to track the progress on the Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></p></li>
<li><p>Created a basic theme for the Livestock CRP community</p></li>
</ul> </ul>
<pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship')) to /tmp/sponsorship.csv with csv;
<p><img src="/cgspace-notes/2017/03/livestock-theme.png" alt="Livestock CRP theme" /></p> </code></pre><ul>
<li>Pull request for controlled vocabulary if Peter approves: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></li>
<h2 id="2017-03-15">2017-03-15</h2> <li>Review Sisay's roots, tubers, and bananas (RTB) theme, which still needs some fixes to work properly: <a href="https://github.com/ilri/DSpace/pull/307">https://github.com/ilri/DSpace/pull/307</a></li>
<li>Created an issue to track the progress on the Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></li>
<li>Created a basic theme for the Livestock CRP community</li>
</ul>
<p><img src="/cgspace-notes/2017/03/livestock-theme.png" alt="Livestock CRP theme"></p>
<h2 id="20170315">2017-03-15</h2>
<ul> <ul>
<li>Merge pull request for controlled vocabulary updates for sponsor: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></li> <li>Merge pull request for controlled vocabulary updates for sponsor: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></li>
<li>Merge pull request for Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></li> <li>Merge pull request for Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></li>
@ -350,9 +280,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>I also need to ask if either of these new fields need to be added to Discovery facets, search, and Atmire modules</li> <li>I also need to ask if either of these new fields need to be added to Discovery facets, search, and Atmire modules</li>
<li>Run all system updates on DSpace Test and re-deploy CGSpace</li> <li>Run all system updates on DSpace Test and re-deploy CGSpace</li>
</ul> </ul>
<h2 id="20170316">2017-03-16</h2>
<h2 id="2017-03-16">2017-03-16</h2>
<ul> <ul>
<li>Merge pull request for PABRA subjects: <a href="https://github.com/ilri/DSpace/pull/310">https://github.com/ilri/DSpace/pull/310</a></li> <li>Merge pull request for PABRA subjects: <a href="https://github.com/ilri/DSpace/pull/310">https://github.com/ilri/DSpace/pull/310</a></li>
<li>Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now</li> <li>Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now</li>
@ -363,55 +291,38 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>Deploy latest changes and investor fixes/deletions on CGSpace</li> <li>Deploy latest changes and investor fixes/deletions on CGSpace</li>
<li>Run system updates on CGSpace and reboot server</li> <li>Run system updates on CGSpace and reboot server</li>
</ul> </ul>
<h2 id="20170320">2017-03-20</h2>
<h2 id="2017-03-20">2017-03-20</h2>
<ul> <ul>
<li>Create basic XMLUI theme for PABRA community: <a href="https://github.com/ilri/DSpace/pull/315">https://github.com/ilri/DSpace/pull/315</a></li> <li>Create basic XMLUI theme for PABRA community: <a href="https://github.com/ilri/DSpace/pull/315">https://github.com/ilri/DSpace/pull/315</a></li>
</ul> </ul>
<h2 id="20170324">2017-03-24</h2>
<h2 id="2017-03-24">2017-03-24</h2>
<ul> <ul>
<li>Still helping Sisay try to figure out how to create a theme for the RTB community</li> <li>Still helping Sisay try to figure out how to create a theme for the RTB community</li>
</ul> </ul>
<h2 id="20170328">2017-03-28</h2>
<h2 id="2017-03-28">2017-03-28</h2>
<ul> <ul>
<li><p>CCAFS said they are ready for the flagship updates for Phase II to be run (<code>cg.subject.ccafs</code>), so I ran them on CGSpace:</p> <li>CCAFS said they are ready for the flagship updates for Phase II to be run (<code>cg.subject.ccafs</code>), so I ran them on CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu <pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
</code></pre></li> </code></pre><ul>
<li>We've been waiting since February to run these</li>
<li><p>We&rsquo;ve been waiting since February to run these</p></li> <li>Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:</li>
</ul>
<li><p>Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:</p>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=210 group by text_value order by count desc) to /tmp/ccafs.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=210 group by text_value order by count desc) to /tmp/ccafs.csv with csv;
</code></pre></li> </code></pre><ul>
<li>I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc</li>
<li><p>I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc</p></li> <li>Test, squash, and merge Sisay's RTB theme into <code>5_x-prod</code>: <a href="https://github.com/ilri/DSpace/pull/316">https://github.com/ilri/DSpace/pull/316</a></li>
<li><p>Test, squash, and merge Sisay&rsquo;s RTB theme into <code>5_x-prod</code>: <a href="https://github.com/ilri/DSpace/pull/316">https://github.com/ilri/DSpace/pull/316</a></p></li>
</ul> </ul>
<h2 id="20170329">2017-03-29</h2>
<h2 id="2017-03-29">2017-03-29</h2>
<ul> <ul>
<li><p>Dump a list of fields in the DC and CG schemas to compare with CG Core:</p> <li>Dump a list of fields in the DC and CG schemas to compare with CG Core:</li>
<pre><code>dspace=# select case when metadata_schema_id=1 then 'dc' else 'cg' end as schema, element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
</code></pre></li>
<li><p>Ooh, a better one!</p>
<pre><code>dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
</code></pre></li>
</ul> </ul>
<pre><code>dspace=# select case when metadata_schema_id=1 then 'dc' else 'cg' end as schema, element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
<h2 id="2017-03-30">2017-03-30</h2> </code></pre><ul>
<li>Ooh, a better one!</li>
</ul>
<pre><code>dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
</code></pre><h2 id="20170330">2017-03-30</h2>
<ul> <ul>
<li>Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150190%, so this should make the alerts less regular</li> <li>Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150190%, so this should make the alerts less regular</li>
<li>Adjust the threshold for DSpace Test from 90 to 100%</li> <li>Adjust the threshold for DSpace Test from 90 to 100%</li>

View File

@ -8,20 +8,15 @@
<meta property="og:title" content="April, 2017" /> <meta property="og:title" content="April, 2017" />
<meta property="og:description" content="2017-04-02 <meta property="og:description" content="2017-04-02
Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): https://github.com/ilri/DSpace/pull/317 Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): https://github.com/ilri/DSpace/pull/317
Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints: Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
Remove redundant/duplicate text in the DSpace submission license Remove redundant/duplicate text in the DSpace submission license
Testing the CMYK patch on a collection with 650 items: Testing the CMYK patch on a collection with 650 items:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-04/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-04/" />
@ -32,22 +27,17 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
<meta name="twitter:title" content="April, 2017"/> <meta name="twitter:title" content="April, 2017"/>
<meta name="twitter:description" content="2017-04-02 <meta name="twitter:description" content="2017-04-02
Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): https://github.com/ilri/DSpace/pull/317 Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): https://github.com/ilri/DSpace/pull/317
Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints: Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
Remove redundant/duplicate text in the DSpace submission license Remove redundant/duplicate text in the DSpace submission license
Testing the CMYK patch on a collection with 650 items: Testing the CMYK patch on a collection with 650 items:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -128,179 +118,140 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
</p> </p>
</header> </header>
<h2 id="2017-04-02">2017-04-02</h2> <h2 id="20170402">2017-04-02</h2>
<ul> <ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> <li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> <li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p>
<ul> <ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li> <li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
<h2 id="2017-04-03">2017-04-03</h2> </code></pre><h2 id="20170403">2017-04-03</h2>
<ul> <ul>
<li><p>Continue testing the CMYK patch on more communities:</p> <li>Continue testing the CMYK patch on more communities:</li>
</ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/1 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&gt; /tmp/filter-media-cmyk.txt 2&gt;&amp;1 <pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/1 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&gt; /tmp/filter-media-cmyk.txt 2&gt;&amp;1
</code></pre></li> </code></pre><ul>
<li>So far there are almost 500:</li>
<li><p>So far there are almost 500:</p> </ul>
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt <pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
484 484
</code></pre></li> </code></pre><ul>
<li>Looking at the CG Core document again, I'll send some feedback to Peter and Abenet:
<li><p>Looking at the CG Core document again, I&rsquo;ll send some feedback to Peter and Abenet:</p>
<ul> <ul>
<li>We use cg.contributor.crp to indicate the CRP(s) affiliated with the item</li> <li>We use cg.contributor.crp to indicate the CRP(s) affiliated with the item</li>
<li>DSpace has dc.date.available, but this field isn&rsquo;t particularly meaningful other than as an automatic timestamp at the time of item accession (and is identical to dc.date.accessioned)</li> <li>DSpace has dc.date.available, but this field isn't particularly meaningful other than as an automatic timestamp at the time of item accession (and is identical to dc.date.accessioned)</li>
<li>dc.relation exists in CGSpace, but isn&rsquo;t used—rather dc.relation.ispartofseries, which is used ~5,000 times to Series name and number within that series</li> <li>dc.relation exists in CGSpace, but isn't used—rather dc.relation.ispartofseries, which is used ~5,000 times to Series name and number within that series</li>
</ul></li>
<li><p>Also, I&rsquo;m noticing some weird outliers in <code>cg.coverage.region</code>, need to remember to go correct these later:</p>
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=227;
</code></pre></li>
</ul> </ul>
</li>
<h2 id="2017-04-04">2017-04-04</h2> <li>Also, I'm noticing some weird outliers in <code>cg.coverage.region</code>, need to remember to go correct these later:</li>
</ul>
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=227;
</code></pre><h2 id="20170404">2017-04-04</h2>
<ul> <ul>
<li><p>The <code>filter-media</code> script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:</p> <li>The <code>filter-media</code> script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:</li>
</ul>
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt <pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
1584 1584
</code></pre></li> </code></pre><ul>
<li>Trying to find a way to get the number of items submitted by a certain user in 2016</li>
<li><p>Trying to find a way to get the number of items submitted by a certain user in 2016</p></li> <li>It's not possible in the DSpace search / module interfaces, but might be able to be derived from <code>dc.description.provenance</code>, as that field contains the name and email of the submitter/approver, ie:</li>
</ul>
<li><p>It&rsquo;s not possible in the DSpace search / module interfaces, but might be able to be derived from <code>dc.description.provenance</code>, as that field contains the name and email of the submitter/approver, ie:</p>
<pre><code>Submitted by Francesca Giampieri (fgiampieri) on 2016-01-19T13:56:43Z^M <pre><code>Submitted by Francesca Giampieri (fgiampieri) on 2016-01-19T13:56:43Z^M
No. of bitstreams: 1^M No. of bitstreams: 1^M
ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0 (MD5) ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0 (MD5)
</code></pre></li> </code></pre><ul>
<li>This SQL query returns fields that were submitted or approved by giampieri in 2016 and contain a &ldquo;checksum&rdquo; (ie, there was a bitstream in the submission):</li>
<li><p>This SQL query returns fields that were submitted or approved by giampieri in 2016 and contain a &ldquo;checksum&rdquo; (ie, there was a bitstream in the submission):</p>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
</code></pre></li>
<li><p>Then this one does the same, but for fields that don&rsquo;t contain checksums (ie, there was no bitstream in the submission):</p>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*' and text_value !~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
</code></pre></li>
<li><p>For some reason there seem to be way too many fields, for example there are 498 + 13 here, which is 511 items for just this one user.</p></li>
<li><p>It looks like there can be a scenario where the user submitted AND approved it, so some records might be doubled&hellip;</p></li>
<li><p>In that case it might just be better to see how many the user submitted (both <em>with</em> and <em>without</em> bitstreams):</p>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*giampieri.*2016-.*';
</code></pre></li>
</ul> </ul>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
<h2 id="2017-04-05">2017-04-05</h2> </code></pre><ul>
<li>Then this one does the same, but for fields that don't contain checksums (ie, there was no bitstream in the submission):</li>
</ul>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*' and text_value !~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
</code></pre><ul>
<li>For some reason there seem to be way too many fields, for example there are 498 + 13 here, which is 511 items for just this one user.</li>
<li>It looks like there can be a scenario where the user submitted AND approved it, so some records might be doubled&hellip;</li>
<li>In that case it might just be better to see how many the user submitted (both <em>with</em> and <em>without</em> bitstreams):</li>
</ul>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*giampieri.*2016-.*';
</code></pre><h2 id="20170405">2017-04-05</h2>
<ul> <ul>
<li><p>After doing a few more large communities it seems this is the final count of CMYK PDFs:</p> <li>After doing a few more large communities it seems this is the final count of CMYK PDFs:</li>
</ul>
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt <pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
2505 2505
</code></pre></li> </code></pre><h2 id="20170406">2017-04-06</h2>
</ul>
<h2 id="2017-04-06">2017-04-06</h2>
<ul> <ul>
<li>After reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">notes for DCAT April 2017</a> I am testing some new settings for PostgreSQL on DSpace Test: <li>After reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">notes for DCAT April 2017</a> I am testing some new settings for PostgreSQL on DSpace Test:
<ul> <ul>
<li><code>db.maxconnections</code> 30→70 (the default PostgreSQL config allows 100 connections, so DSpace&rsquo;s default of 30 is quite low)</li> <li><code>db.maxconnections</code> 30→70 (the default PostgreSQL config allows 100 connections, so DSpace's default of 30 is quite low)</li>
<li><code>db.maxwait</code> 5000→10000</li> <li><code>db.maxwait</code> 5000→10000</li>
<li><code>db.maxidle</code> 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)</li> <li><code>db.maxidle</code> 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)</li>
</ul></li> </ul>
</li>
<li>I need to look at the Munin graphs after a few days to see if the load has changed</li> <li>I need to look at the Munin graphs after a few days to see if the load has changed</li>
<li>Run system updates on DSpace Test and reboot the server</li> <li>Run system updates on DSpace Test and reboot the server</li>
<li>Discussing harvesting CIFOR&rsquo;s DSpace via OAI</li> <li>Discussing harvesting CIFOR's DSpace via OAI</li>
<li>Sisay added their OAI as a source to a new collection, but using the Simple Dublin Core method, so many fields are unqualified and duplicated</li> <li>Sisay added their OAI as a source to a new collection, but using the Simple Dublin Core method, so many fields are unqualified and duplicated</li>
<li>Looking at the <a href="https://wiki.duraspace.org/display/DSDOC5x/XMLUI+Configuration+and+Customization">documentation</a> it seems that we probably want to be using DSpace Intermediate Metadata</li> <li>Looking at the <a href="https://wiki.duraspace.org/display/DSDOC5x/XMLUI+Configuration+and+Customization">documentation</a> it seems that we probably want to be using DSpace Intermediate Metadata</li>
</ul> </ul>
<h2 id="20170410">2017-04-10</h2>
<h2 id="2017-04-10">2017-04-10</h2>
<ul> <ul>
<li>Adjust Linode CPU usage alerts on DSpace servers <li>Adjust Linode CPU usage alerts on DSpace servers
<ul> <ul>
<li>CGSpace from 200 to 250%</li> <li>CGSpace from 200 to 250%</li>
<li>DSpace Test from 100 to 150%</li> <li>DSpace Test from 100 to 150%</li>
</ul></li> </ul>
</li>
<li>Remove James from Linode access</li> <li>Remove James from Linode access</li>
<li>Look into having CIFOR use a sub prefix of 10568 like 10568.01</li> <li>Look into having CIFOR use a sub prefix of 10568 like 10568.01</li>
<li>Handle.net calls this <a href="https://www.handle.net/faq.html#4">&ldquo;derived prefixes&rdquo;</a> and it seems this would work with DSpace if we wanted to go that route</li> <li>Handle.net calls this <a href="https://www.handle.net/faq.html#4">&ldquo;derived prefixes&rdquo;</a> and it seems this would work with DSpace if we wanted to go that route</li>
<li>CIFOR is starting to test aligning their metadata more with CGSpace/CG core</li> <li>CIFOR is starting to test aligning their metadata more with CGSpace/CG core</li>
<li>They shared a <a href="https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full">test item</a> which is using <code>cg.coverage.country</code>, <code>cg.subject.cifor</code>, <code>dc.subject</code>, and <code>dc.date.issued</code></li> <li>They shared a <a href="https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full">test item</a> which is using <code>cg.coverage.country</code>, <code>cg.subject.cifor</code>, <code>dc.subject</code>, and <code>dc.date.issued</code></li>
<li>Looking at their OAI I&rsquo;m not sure it has updated as I don&rsquo;t see the new fields: <a href="https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc///col_11463_6/900">https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc///col_11463_6/900</a></li> <li>Looking at their OAI I'm not sure it has updated as I don't see the new fields: <a href="https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc///col_11463_6/900">https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc///col_11463_6/900</a></li>
<li>Maybe they need to make sure they are running the OAI cache refresh cron job, or maybe OAI doesn&rsquo;t export these?</li> <li>Maybe they need to make sure they are running the OAI cache refresh cron job, or maybe OAI doesn't export these?</li>
<li>I added <code>cg.subject.cifor</code> to the metadata registry and I&rsquo;m waiting for the harvester to re-harvest to see if it picks up more data now</li> <li>I added <code>cg.subject.cifor</code> to the metadata registry and I'm waiting for the harvester to re-harvest to see if it picks up more data now</li>
<li>Another possiblity is that we could use a cross walk&hellip; but I&rsquo;ve never done it.</li> <li>Another possiblity is that we could use a cross walk&hellip; but I've never done it.</li>
</ul> </ul>
<h2 id="20170411">2017-04-11</h2>
<h2 id="2017-04-11">2017-04-11</h2>
<ul> <ul>
<li>Looking at the item from CIFOR it hasn&rsquo;t been updated yet, maybe they aren&rsquo;t running the cron job</li> <li>Looking at the item from CIFOR it hasn't been updated yet, maybe they aren't running the cron job</li>
<li>I emailed Usman from CIFOR to ask if he&rsquo;s running the cron job</li> <li>I emailed Usman from CIFOR to ask if he's running the cron job</li>
</ul> </ul>
<h2 id="20170412">2017-04-12</h2>
<h2 id="2017-04-12">2017-04-12</h2>
<ul> <ul>
<li>CIFOR says they have cleaned their OAI cache and that the cron job for OAI import is enabled</li> <li>CIFOR says they have cleaned their OAI cache and that the cron job for OAI import is enabled</li>
<li>Now I see updated fields, like <code>dc.date.issued</code> but none from the CG or CIFOR namespaces</li> <li>Now I see updated fields, like <code>dc.date.issued</code> but none from the CG or CIFOR namespaces</li>
<li>Also, DSpace Test hasn&rsquo;t re-harvested this item yet, so I will wait one more day before forcing a re-harvest</li> <li>Also, DSpace Test hasn't re-harvested this item yet, so I will wait one more day before forcing a re-harvest</li>
<li>Looking at CIFOR&rsquo;s OAI using different metadata formats, like qualified Dublin Core and DSpace Intermediate Metadata: <li>Looking at CIFOR's OAI using different metadata formats, like qualified Dublin Core and DSpace Intermediate Metadata:
<ul> <ul>
<li>QDC: <a href="https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=qdc///col_11463_6/900">https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=qdc///col_11463_6/900</a></li> <li>QDC: <a href="https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=qdc///col_11463_6/900">https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=qdc///col_11463_6/900</a></li>
<li>DIM: <a href="https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=dim///col_11463_6/900">https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=dim///col_11463_6/900</a></li> <li>DIM: <a href="https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=dim///col_11463_6/900">https://data.cifor.org/dspace/oai/request?verb=ListRecords&amp;resumptionToken=dim///col_11463_6/900</a></li>
</ul></li> </ul>
<li>Looking at one of CGSpace&rsquo;s items in OAI it doesn&rsquo;t seem that metadata fields other than those in the DC schema are exported: </li>
<li>Looking at one of CGSpace's items in OAI it doesn't seem that metadata fields other than those in the DC schema are exported:
<ul> <ul>
<li><a href="https://cgspace.cgiar.org/handle/10568/33346?show=full">https://cgspace.cgiar.org/handle/10568/33346?show=full</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/33346?show=full">https://cgspace.cgiar.org/handle/10568/33346?show=full</a></li>
<li><a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=dim&amp;set=col_10568_68619">https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=dim&amp;set=col_10568_68619</a></li> <li><a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=dim&amp;set=col_10568_68619">https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=dim&amp;set=col_10568_68619</a></li>
</ul></li>
<li>Side note: WTF, I just saw an item on CGSpace&rsquo;s OAI that is using <code>dc.cplace.country</code> and <code>dc.rplace.region</code>, which we stopped using in 2016 after the metadata migrations:</li>
</ul> </ul>
</li>
<p><img src="/cgspace-notes/2017/04/cplace.png" alt="stale metadata in OAI" /></p> <li>Side note: WTF, I just saw an item on CGSpace's OAI that is using <code>dc.cplace.country</code> and <code>dc.rplace.region</code>, which we stopped using in 2016 after the metadata migrations:</li>
</ul>
<p><img src="/cgspace-notes/2017/04/cplace.png" alt="stale metadata in OAI"></p>
<ul> <ul>
<li>The particular item is <a href="http://hdl.handle.net/10568/6"><sup>10568</sup>&frasl;<sub>6</sub></a> and, for what it&rsquo;s worth, the stale metadata only appears in the OAI view: <li>The particular item is <a href="http://hdl.handle.net/10568/6">10568/6</a> and, for what it's worth, the stale metadata only appears in the OAI view:
<ul> <ul>
<li>XMLUI: <a href="https://cgspace.cgiar.org/handle/10568/6?show=full">https://cgspace.cgiar.org/handle/10568/6?show=full</a></li> <li>XMLUI: <a href="https://cgspace.cgiar.org/handle/10568/6?show=full">https://cgspace.cgiar.org/handle/10568/6?show=full</a></li>
<li>OAI: <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6">https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6</a></li> <li>OAI: <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6">https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6</a></li>
</ul></li> </ul>
<li>I don&rsquo;t see these fields anywhere in our source code or the database&rsquo;s metadata registry, so maybe it&rsquo;s just a cache issue</li> </li>
<li>I don't see these fields anywhere in our source code or the database's metadata registry, so maybe it's just a cache issue</li>
<li>I will have to check the OAI cron scripts on DSpace Test, and then run them on CGSpace</li> <li>I will have to check the OAI cron scripts on DSpace Test, and then run them on CGSpace</li>
<li>Running <code>dspace oai import</code> and <code>dspace oai clean-cache</code> have zero effect, but this seems to rebuild the cache from scratch:</li>
<li><p>Running <code>dspace oai import</code> and <code>dspace oai clean-cache</code> have zero effect, but this seems to rebuild the cache from scratch:</p> </ul>
<pre><code>$ /home/dspacetest.cgiar.org/bin/dspace oai import -c <pre><code>$ /home/dspacetest.cgiar.org/bin/dspace oai import -c
... ...
63900 items imported so far... 63900 items imported so far...
@ -308,16 +259,12 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
Total: 64056 items Total: 64056 items
Purging cached OAI responses. Purging cached OAI responses.
OAI 2.0 manager action ended. It took 829 seconds. OAI 2.0 manager action ended. It took 829 seconds.
</code></pre></li> </code></pre><ul>
<li>After reading some threads on the DSpace mailing list, I see that <code>clean-cache</code> is actually only for caching <em>responses</em>, ie to client requests in the OAI web application</li>
<li><p>After reading some threads on the DSpace mailing list, I see that <code>clean-cache</code> is actually only for caching <em>responses</em>, ie to client requests in the OAI web application</p></li> <li>These are stored in <code>[dspace]/var/oai/requests/</code></li>
<li>The import command should theoretically catch situations like this where an item's metadata was updated, but in this case we changed the metadata schema and it doesn't seem to catch it (could be a bug!)</li>
<li><p>These are stored in <code>[dspace]/var/oai/requests/</code></p></li> <li>Attempting a full rebuild of OAI on CGSpace:</li>
</ul>
<li><p>The import command should theoretically catch situations like this where an item&rsquo;s metadata was updated, but in this case we changed the metadata schema and it doesn&rsquo;t seem to catch it (could be a bug!)</p></li>
<li><p>Attempting a full rebuild of OAI on CGSpace:</p>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace oai import -c $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace oai import -c
... ...
@ -329,225 +276,169 @@ OAI 2.0 manager action ended. It took 1032 seconds.
real 17m20.156s real 17m20.156s
user 4m35.293s user 4m35.293s
sys 1m29.310s sys 1m29.310s
</code></pre></li> </code></pre><ul>
<li>Now the data for 10568/6 is correct in OAI: <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6">https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6</a></li>
<li><p>Now the data for <sup>10568</sup>&frasl;<sub>6</sub> is correct in OAI: <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6">https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:cgspace.cgiar.org:10568/6</a></p></li> <li>Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list?</li>
<li>I wonder if we could use a crosswalk to convert to a format that CG Core wants, like <code>&lt;date Type=&quot;Available&quot;&gt;</code></li>
<li><p>Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list?</p></li>
<li><p>I wonder if we could use a crosswalk to convert to a format that CG Core wants, like <code>&lt;date Type=&quot;Available&quot;&gt;</code></p></li>
</ul> </ul>
<h2 id="20170413">2017-04-13</h2>
<h2 id="2017-04-13">2017-04-13</h2>
<ul> <ul>
<li>Checking the <a href="https://dspacetest.cgiar.org/handle/11463/947?show=full">CIFOR item on DSpace Test</a>, it still doesn&rsquo;t have the new metadata</li> <li>Checking the <a href="https://dspacetest.cgiar.org/handle/11463/947?show=full">CIFOR item on DSpace Test</a>, it still doesn't have the new metadata</li>
<li>The collection status shows this message from the harvester:</li> <li>The collection status shows this message from the harvester:</li>
</ul> </ul>
<blockquote> <blockquote>
<p>Last Harvest Result: OAI server did not contain any updates on 2017-04-13 02:19:47.964</p> <p>Last Harvest Result: OAI server did not contain any updates on 2017-04-13 02:19:47.964</p>
</blockquote> </blockquote>
<ul> <ul>
<li>I don&rsquo;t know why there were no updates detected, so I will reset and reimport the collection</li> <li>I don't know why there were no updates detected, so I will reset and reimport the collection</li>
<li>Usman has set up a custom crosswalk called <code>dimcg</code> that now shows CG and CIFOR metadata namespaces, but we can&rsquo;t use it because DSpace can only harvest DIM by default (from the harvesting user interface)</li> <li>Usman has set up a custom crosswalk called <code>dimcg</code> that now shows CG and CIFOR metadata namespaces, but we can't use it because DSpace can only harvest DIM by default (from the harvesting user interface)</li>
<li>Also worth noting that the REST interface exposes all fields in the item, including CG and CIFOR fields: <a href="https://data.cifor.org/dspace/rest/items/944?expand=metadata">https://data.cifor.org/dspace/rest/items/944?expand=metadata</a></li> <li>Also worth noting that the REST interface exposes all fields in the item, including CG and CIFOR fields: <a href="https://data.cifor.org/dspace/rest/items/944?expand=metadata">https://data.cifor.org/dspace/rest/items/944?expand=metadata</a></li>
<li>After re-importing the CIFOR collection it looks <em>very</em> good!</li> <li>After re-importing the CIFOR collection it looks <em>very</em> good!</li>
<li>It seems like they have done a full metadata migration with <code>dc.date.issued</code> and <code>cg.coverage.country</code> etc</li> <li>It seems like they have done a full metadata migration with <code>dc.date.issued</code> and <code>cg.coverage.country</code> etc</li>
<li>Submit pull request to upstream DSpace for the PDF thumbnail bug (DS-3516): <a href="https://github.com/DSpace/DSpace/pull/1709">https://github.com/DSpace/DSpace/pull/1709</a></li> <li>Submit pull request to upstream DSpace for the PDF thumbnail bug (DS-3516): <a href="https://github.com/DSpace/DSpace/pull/1709">https://github.com/DSpace/DSpace/pull/1709</a></li>
</ul> </ul>
<h2 id="20170414">2017-04-14</h2>
<h2 id="2017-04-14">2017-04-14</h2>
<ul> <ul>
<li>DSpace committers reviewed my patch for DS-3516 and proposed a simpler idea involving incorrect use of <code>SelfRegisteredInputFormats</code></li> <li>DSpace committers reviewed my patch for DS-3516 and proposed a simpler idea involving incorrect use of <code>SelfRegisteredInputFormats</code></li>
<li>I tested the idea and it works, so I made a new patch: <a href="https://github.com/DSpace/DSpace/pull/1709">https://github.com/DSpace/DSpace/pull/1709</a></li> <li>I tested the idea and it works, so I made a new patch: <a href="https://github.com/DSpace/DSpace/pull/1709">https://github.com/DSpace/DSpace/pull/1709</a></li>
<li>I discovered that we can override metadata formats in OAI by creating a new &ldquo;context&rdquo;: <a href="https://wiki.duraspace.org/display/DSDOC5x/OAI+2.0+Server">https://wiki.duraspace.org/display/DSDOC5x/OAI+2.0+Server</a></li> <li>I discovered that we can override metadata formats in OAI by creating a new &ldquo;context&rdquo;: <a href="https://wiki.duraspace.org/display/DSDOC5x/OAI+2.0+Server">https://wiki.duraspace.org/display/DSDOC5x/OAI+2.0+Server</a></li>
<li>This allows us to have, say a default &ldquo;request&rdquo; context and a &ldquo;cgiar&rdquo; context, both of which implement the DSpace Intermediate Metadata formats, but have the later use a overridden version that exposes CG metadata</li> <li>This allows us to have, say a default &ldquo;request&rdquo; context and a &ldquo;cgiar&rdquo; context, both of which implement the DSpace Intermediate Metadata formats, but have the later use a overridden version that exposes CG metadata</li>
<li>Compare the following results: <li>Compare the following results:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6">https://dspacetest.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6</a></li> <li><a href="https://dspacetest.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6">https://dspacetest.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6</a></li>
<li><a href="https://dspacetest.cgiar.org/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6">https://dspacetest.cgiar.org/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6</a></li> <li><a href="https://dspacetest.cgiar.org/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6">https://dspacetest.cgiar.org/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:dspacetest.cgiar.org:10568/6</a></li>
</ul></li> </ul>
</li>
<li>Reboot DSpace Test server to get new Linode kernel</li> <li>Reboot DSpace Test server to get new Linode kernel</li>
</ul> </ul>
<h2 id="20170417">2017-04-17</h2>
<h2 id="2017-04-17">2017-04-17</h2>
<ul> <ul>
<li>CIFOR has now implemented a new &ldquo;cgiar&rdquo; context in their OAI that exposes CG fields, so I am re-harvesting that to see how it looks in the Discovery sidebars and searches</li> <li>CIFOR has now implemented a new &ldquo;cgiar&rdquo; context in their OAI that exposes CG fields, so I am re-harvesting that to see how it looks in the Discovery sidebars and searches</li>
<li>See: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947</a></li> <li>See: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947</a></li>
<li>One thing we need to remember if we start using OAI is to enable the autostart of the harvester process (see <code>harvester.autoStart</code> in <code>dspace/config/modules/oai.cfg</code>)</li> <li>One thing we need to remember if we start using OAI is to enable the autostart of the harvester process (see <code>harvester.autoStart</code> in <code>dspace/config/modules/oai.cfg</code>)</li>
<li>Error when running DSpace cleanup task on DSpace Test and CGSpace (on the same item), I need to look this up:</li>
<li><p>Error when running DSpace cleanup task on DSpace Test and CGSpace (on the same item), I need to look this up:</p>
<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(435) is still referenced from table &quot;bundle&quot;.
</code></pre></li>
</ul> </ul>
<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
<h2 id="2017-04-18">2017-04-18</h2> Detail: Key (bitstream_id)=(435) is still referenced from table &quot;bundle&quot;.
</code></pre><h2 id="20170418">2017-04-18</h2>
<ul> <ul>
<li>Helping Tsega test his new <a href="https://github.com/ilri/ckm-cgspace-rest-api">CGSpace REST API Rails app</a> on DSpace Test</li> <li>Helping Tsega test his new <a href="https://github.com/ilri/ckm-cgspace-rest-api">CGSpace REST API Rails app</a> on DSpace Test</li>
<li>Setup and run with:</li>
<li><p>Setup and run with:</p> </ul>
<pre><code>$ git clone https://github.com/ilri/ckm-cgspace-rest-api.git <pre><code>$ git clone https://github.com/ilri/ckm-cgspace-rest-api.git
$ cd ckm-cgspace-rest-api/app $ cd ckm-cgspace-rest-api/app
$ gem install bundler $ gem install bundler
$ bundle $ bundle
$ cd .. $ cd ..
$ rails -s $ rails -s
</code></pre></li> </code></pre><ul>
<li>I used Ansible to create a PostgreSQL user that only has <code>SELECT</code> privileges on the tables it needs:</li>
<li><p>I used Ansible to create a PostgreSQL user that only has <code>SELECT</code> privileges on the tables it needs:</p>
<pre><code>$ ansible linode02 -u aorth -b --become-user=postgres -K -m postgresql_user -a 'db=database name=username password=password priv=CONNECT/item:SELECT/metadatavalue:SELECT/metadatafieldregistry:SELECT/metadataschemaregistry:SELECT/collection:SELECT/handle:SELECT/bundle2bitstream:SELECT/bitstream:SELECT/bundle:SELECT/item2bundle:SELECT state=present
</code></pre></li>
<li><p>Need to look into <a href="https://github.com/puma/puma/blob/master/docs/systemd.md">running this via systemd</a></p></li>
<li><p>This is interesting for creating runnable commands from <code>bundle</code>:</p>
<pre><code>$ bundle binstubs puma --path ./sbin
</code></pre></li>
</ul> </ul>
<pre><code>$ ansible linode02 -u aorth -b --become-user=postgres -K -m postgresql_user -a 'db=database name=username password=password priv=CONNECT/item:SELECT/metadatavalue:SELECT/metadatafieldregistry:SELECT/metadataschemaregistry:SELECT/collection:SELECT/handle:SELECT/bundle2bitstream:SELECT/bitstream:SELECT/bundle:SELECT/item2bundle:SELECT state=present
<h2 id="2017-04-19">2017-04-19</h2> </code></pre><ul>
<li>Need to look into <a href="https://github.com/puma/puma/blob/master/docs/systemd.md">running this via systemd</a></li>
<li>This is interesting for creating runnable commands from <code>bundle</code>:</li>
</ul>
<pre><code>$ bundle binstubs puma --path ./sbin
</code></pre><h2 id="20170419">2017-04-19</h2>
<ul> <ul>
<li>Usman sent another link to their OAI interface, where the country names are now capitalized: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947</a></li> <li>Usman sent another link to their OAI interface, where the country names are now capitalized: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947</a></li>
<li>Looking at the same item in XMLUI, the countries are not capitalized: <a href="https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full">https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full</a></li> <li>Looking at the same item in XMLUI, the countries are not capitalized: <a href="https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full">https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full</a></li>
<li>So it seems he did it in the crosswalk!</li> <li>So it seems he did it in the crosswalk!</li>
<li>Keep working on Ansible stuff for deploying the CKM REST API</li> <li>Keep working on Ansible stuff for deploying the CKM REST API</li>
<li>We can use systemd&rsquo;s <code>Environment</code> stuff to pass the database parameters to Rails</li> <li>We can use systemd's <code>Environment</code> stuff to pass the database parameters to Rails</li>
<li>Abenet noticed that the &ldquo;Workflow Statistics&rdquo; option is missing now, but we have screenshots from a presentation in 2016 when it was there</li> <li>Abenet noticed that the &ldquo;Workflow Statistics&rdquo; option is missing now, but we have screenshots from a presentation in 2016 when it was there</li>
<li>I filed a ticket with Atmire</li> <li>I filed a ticket with Atmire</li>
<li>Looking at 933 CIAT records from Sisay, he&rsquo;s having problems creating a SAF bundle to import to DSpace Test</li> <li>Looking at 933 CIAT records from Sisay, he's having problems creating a SAF bundle to import to DSpace Test</li>
<li>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</li>
<li><p>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</p>
<pre><code>value.replace(&quot; ||&quot;,&quot;||&quot;).replace(&quot;|| &quot;,&quot;||&quot;).replace(&quot; || &quot;,&quot;||&quot;)
</code></pre></li>
<li><p>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</p>
<pre><code>unescape(value,&quot;url&quot;)
</code></pre></li>
<li><p>Then create the filename column using the following transform from URL:</p>
<pre><code>value.split('/')[-1].replace(/#.*$/,&quot;&quot;)
</code></pre></li>
<li><p>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don&rsquo;t want on the filename</p></li>
<li><p>Also, we need to only use the PDF on the item corresponding with page 1, so we don&rsquo;t end up with literally hundreds of duplicate PDFs</p></li>
<li><p>Alternatively, I could export each page to a standalone PDF&hellip;</p></li>
</ul> </ul>
<pre><code>value.replace(&quot; ||&quot;,&quot;||&quot;).replace(&quot;|| &quot;,&quot;||&quot;).replace(&quot; || &quot;,&quot;||&quot;)
<h2 id="2017-04-20">2017-04-20</h2> </code></pre><ul>
<li>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</li>
</ul>
<pre><code>unescape(value,&quot;url&quot;)
</code></pre><ul>
<li>Then create the filename column using the following transform from URL:</li>
</ul>
<pre><code>value.split('/')[-1].replace(/#.*$/,&quot;&quot;)
</code></pre><ul>
<li>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don't want on the filename</li>
<li>Also, we need to only use the PDF on the item corresponding with page 1, so we don't end up with literally hundreds of duplicate PDFs</li>
<li>Alternatively, I could export each page to a standalone PDF&hellip;</li>
</ul>
<h2 id="20170420">2017-04-20</h2>
<ul> <ul>
<li>Atmire responded about the Workflow Statistics, saying that it had been disabled because many environments needed customization to be useful</li> <li>Atmire responded about the Workflow Statistics, saying that it had been disabled because many environments needed customization to be useful</li>
<li>I re-enabled it with a hidden config key <code>workflow.stats.enabled = true</code> on DSpace Test and will evaluate adding it on CGSpace</li> <li>I re-enabled it with a hidden config key <code>workflow.stats.enabled = true</code> on DSpace Test and will evaluate adding it on CGSpace</li>
<li>Looking at the CIAT data again, a bunch of items have metadata values ending in <code>||</code>, which might cause blank fields to be added at import time</li> <li>Looking at the CIAT data again, a bunch of items have metadata values ending in <code>||</code>, which might cause blank fields to be added at import time</li>
<li>Cleaning them up with OpenRefine:</li>
<li><p>Cleaning them up with OpenRefine:</p>
<pre><code>value.replace(/\|\|$/,&quot;&quot;)
</code></pre></li>
<li><p>Working with the CIAT data in OpenRefine to remove the filename column from all but the first item which requires a particular PDF, as there are many items pointing to the same PDF, which would cause hundreds of duplicates to be added if we included them in the SAF bundle</p></li>
<li><p>I did some massaging in OpenRefine, flagging duplicates with stars and flags, then filtering and removing the filenames of those items</p></li>
</ul> </ul>
<pre><code>value.replace(/\|\|$/,&quot;&quot;)
<p><img src="/cgspace-notes/2017/04/openrefine-flagging-duplicates.png" alt="Flagging and filtering duplicates in OpenRefine" /></p> </code></pre><ul>
<li>Working with the CIAT data in OpenRefine to remove the filename column from all but the first item which requires a particular PDF, as there are many items pointing to the same PDF, which would cause hundreds of duplicates to be added if we included them in the SAF bundle</li>
<li>I did some massaging in OpenRefine, flagging duplicates with stars and flags, then filtering and removing the filenames of those items</li>
</ul>
<p><img src="/cgspace-notes/2017/04/openrefine-flagging-duplicates.png" alt="Flagging and filtering duplicates in OpenRefine"></p>
<ul> <ul>
<li>Also there are loads of whitespace errors in almost every field, so I trimmed leading/trailing whitespace</li> <li>Also there are loads of whitespace errors in almost every field, so I trimmed leading/trailing whitespace</li>
<li>Unbelievable, there are also metadata values like:</li>
<li><p>Unbelievable, there are also metadata values like:</p> </ul>
<pre><code>COLLETOTRICHUM LINDEMUTHIANUM|| FUSARIUM||GERMPLASM <pre><code>COLLETOTRICHUM LINDEMUTHIANUM|| FUSARIUM||GERMPLASM
</code></pre></li> </code></pre><ul>
<li>Add a description to the file names using:</li>
<li><p>Add a description to the file names using:</p> </ul>
<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value <pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
</code></pre></li> </code></pre><ul>
<li>Test import of 933 records:</li>
<li><p>Test import of 933 records:</p> </ul>
<pre><code>$ [dspace]/bin/dspace import -a -e aorth@mjanja.ch -c 10568/87193 -s /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ -m /tmp/ciat <pre><code>$ [dspace]/bin/dspace import -a -e aorth@mjanja.ch -c 10568/87193 -s /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ -m /tmp/ciat
$ wc -l /tmp/ciat $ wc -l /tmp/ciat
933 /tmp/ciat 933 /tmp/ciat
</code></pre></li> </code></pre><ul>
<li>Run system updates on CGSpace and reboot server</li>
<li><p>Run system updates on CGSpace and reboot server</p></li> <li>This includes switching nginx to using upstream with keepalive instead of direct <code>proxy_pass</code></li>
<li>Re-deploy CGSpace to latest <code>5_x-prod</code>, including the PABRA and RTB XMLUI themes, as well as the PDF processing and CMYK changes</li>
<li><p>This includes switching nginx to using upstream with keepalive instead of direct <code>proxy_pass</code></p></li> <li>More work on Ansible infrastructure stuff for Tsega's CKM DSpace REST API</li>
<li>I'm going to start re-processing all the PDF thumbnails on CGSpace, one community at a time:</li>
<li><p>Re-deploy CGSpace to latest <code>5_x-prod</code>, including the PABRA and RTB XMLUI themes, as well as the PDF processing and CMYK changes</p></li> </ul>
<li><p>More work on Ansible infrastructure stuff for Tsega&rsquo;s CKM DSpace REST API</p></li>
<li><p>I&rsquo;m going to start re-processing all the PDF thumbnails on CGSpace, one community at a time:</p>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li> </code></pre><h2 id="20170422">2017-04-22</h2>
</ul>
<h2 id="2017-04-22">2017-04-22</h2>
<ul> <ul>
<li>Someone on the dspace-tech mailing list responded with a suggestion about the foreign key violation in the <code>cleanup</code> task</li> <li>Someone on the dspace-tech mailing list responded with a suggestion about the foreign key violation in the <code>cleanup</code> task</li>
<li>The solution is to remove the ID (ie set to NULL) from the <code>primary_bitstream_id</code> column in the <code>bundle</code> table</li> <li>The solution is to remove the ID (ie set to NULL) from the <code>primary_bitstream_id</code> column in the <code>bundle</code> table</li>
<li>After doing that and running the <code>cleanup</code> task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:</li>
<li><p>After doing that and running the <code>cleanup</code> task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:</p>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1136, 1132, 1220, 1236, 3002, 3255, 5322);
</code></pre></li>
</ul> </ul>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1136, 1132, 1220, 1236, 3002, 3255, 5322);
<h2 id="2017-04-24">2017-04-24</h2> </code></pre><h2 id="20170424">2017-04-24</h2>
<ul> <ul>
<li>Two users mentioned some items they recently approved not showing up in the search / XMLUI</li> <li>Two users mentioned some items they recently approved not showing up in the search / XMLUI</li>
<li>I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:</li>
<li><p>I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:</p> </ul>
<pre><code>2017-04-24 00:00:15,578 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55 of 58853): 70590 <pre><code>2017-04-24 00:00:15,578 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55 of 58853): 70590
2017-04-24 00:00:15,586 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (56 of 58853): 74507 2017-04-24 00:00:15,586 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (56 of 58853): 74507
2017-04-24 00:00:15,614 ERROR com.atmire.dspace.discovery.AtmireSolrService @ this IndexWriter is closed 2017-04-24 00:00:15,614 ERROR com.atmire.dspace.discovery.AtmireSolrService @ this IndexWriter is closed
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this IndexWriter is closed org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this IndexWriter is closed
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285)
at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271)
at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:331) at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:331)
at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:315) at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:315)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:803) at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:803)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876) at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
at org.dspace.discovery.IndexClient.main(IndexClient.java:127) at org.dspace.discovery.IndexClient.main(IndexClient.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
</code></pre></li> </code></pre><ul>
<li>Looking at the past few days of logs, it looks like the indexing process started crashing on 2017-04-20:</li>
<li><p>Looking at the past few days of logs, it looks like the indexing process started crashing on 2017-04-20:</p> </ul>
<pre><code># grep -c 'IndexWriter is closed' [dspace]/log/dspace.log.2017-04-* <pre><code># grep -c 'IndexWriter is closed' [dspace]/log/dspace.log.2017-04-*
[dspace]/log/dspace.log.2017-04-01:0 [dspace]/log/dspace.log.2017-04-01:0
[dspace]/log/dspace.log.2017-04-02:0 [dspace]/log/dspace.log.2017-04-02:0
@ -573,36 +464,28 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this Index
[dspace]/log/dspace.log.2017-04-22:13278 [dspace]/log/dspace.log.2017-04-22:13278
[dspace]/log/dspace.log.2017-04-23:22720 [dspace]/log/dspace.log.2017-04-23:22720
[dspace]/log/dspace.log.2017-04-24:21422 [dspace]/log/dspace.log.2017-04-24:21422
</code></pre></li> </code></pre><ul>
<li>I restarted Tomcat and re-ran the discovery process manually:</li>
<li><p>I restarted Tomcat and re-ran the discovery process manually:</p>
<pre><code>[dspace]/bin/dspace index-discovery
</code></pre></li>
<li><p>Now everything is ok</p></li>
<li><p>Finally finished manually running the cleanup task over and over and null&rsquo;ing the conflicting IDs:</p>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1132, 1136, 1220, 1236, 3002, 3255, 5322, 5098, 5982, 5897, 6245, 6184, 4927, 6070, 4925, 6888, 7368, 7136, 7294, 7698, 7864, 10799, 10839, 11765, 13241, 13634, 13642, 14127, 14146, 15582, 16116, 16254, 17136, 17486, 17824, 18098, 22091, 22149, 22206, 22449, 22548, 22559, 22454, 22253, 22553, 22897, 22941, 30262, 33657, 39796, 46943, 56561, 58237, 58739, 58734, 62020, 62535, 64149, 64672, 66988, 66919, 76005, 79780, 78545, 81078, 83620, 84492, 92513, 93915);
</code></pre></li>
<li><p>Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it&rsquo;s likely we haven&rsquo;t had a cleanup task complete successfully in years&hellip;</p></li>
</ul> </ul>
<pre><code>[dspace]/bin/dspace index-discovery
<h2 id="2017-04-25">2017-04-25</h2> </code></pre><ul>
<li>Now everything is ok</li>
<li>Finally finished manually running the cleanup task over and over and null'ing the conflicting IDs:</li>
</ul>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1132, 1136, 1220, 1236, 3002, 3255, 5322, 5098, 5982, 5897, 6245, 6184, 4927, 6070, 4925, 6888, 7368, 7136, 7294, 7698, 7864, 10799, 10839, 11765, 13241, 13634, 13642, 14127, 14146, 15582, 16116, 16254, 17136, 17486, 17824, 18098, 22091, 22149, 22206, 22449, 22548, 22559, 22454, 22253, 22553, 22897, 22941, 30262, 33657, 39796, 46943, 56561, 58237, 58739, 58734, 62020, 62535, 64149, 64672, 66988, 66919, 76005, 79780, 78545, 81078, 83620, 84492, 92513, 93915);
</code></pre><ul>
<li>Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it's likely we haven't had a cleanup task complete successfully in years&hellip;</li>
</ul>
<h2 id="20170425">2017-04-25</h2>
<ul> <ul>
<li>Finally finished running the PDF thumbnail re-processing on CGSpace, the final count of CMYK PDFs is about 2751</li> <li>Finally finished running the PDF thumbnail re-processing on CGSpace, the final count of CMYK PDFs is about 2751</li>
<li>Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:</li>
<li><p>Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:</p> </ul>
<pre><code># find [dspace]/assetstore/ -type f | wc -l <pre><code># find [dspace]/assetstore/ -type f | wc -l
113104 113104
</code></pre></li> </code></pre><ul>
<li>Troubleshooting the Atmire Solr update process that runs at 3:00 AM every morning, after finishing at 100% it has this error:</li>
<li><p>Troubleshooting the Atmire Solr update process that runs at 3:00 AM every morning, after finishing at 100% it has this error:</p> </ul>
<pre><code>[=================================================&gt; ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12 <pre><code>[=================================================&gt; ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
[=================================================&gt; ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12 [=================================================&gt; ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
[=================================================&gt; ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12 [=================================================&gt; ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
@ -653,36 +536,26 @@ Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpac
at java.lang.Class.forName(Class.java:264) at java.lang.Class.forName(Class.java:264)
at com.atmire.statistics.statlet.XmlParser.parsedatasetGenerator(SourceFile:299) at com.atmire.statistics.statlet.XmlParser.parsedatasetGenerator(SourceFile:299)
at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:250) at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:250)
</code></pre></li> </code></pre><ul>
<li>Run system updates on DSpace Test and reboot the server (new Java 8 131)</li>
<li><p>Run system updates on DSpace Test and reboot the server (new Java 8 131)</p></li> <li>Run the SQL cleanups on the bundle table on CGSpace and run the <code>[dspace]/bin/dspace cleanup</code> task</li>
<li>I will be interested to see the file count in the assetstore as well as the database size after the next backup (last backup size is 111M)</li>
<li><p>Run the SQL cleanups on the bundle table on CGSpace and run the <code>[dspace]/bin/dspace cleanup</code> task</p></li> <li>Final file count after the cleanup task finished: 77843</li>
<li>So that is 30,000 files, and about 7GB</li>
<li><p>I will be interested to see the file count in the assetstore as well as the database size after the next backup (last backup size is 111M)</p></li> <li>Add logging to the cleanup cron task</li>
<li><p>Final file count after the cleanup task finished: 77843</p></li>
<li><p>So that is 30,000 files, and about 7GB</p></li>
<li><p>Add logging to the cleanup cron task</p></li>
</ul> </ul>
<h2 id="20170426">2017-04-26</h2>
<h2 id="2017-04-26">2017-04-26</h2>
<ul> <ul>
<li>The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though</li> <li>The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though</li>
<li>Update RVM's Ruby from 2.3.0 to 2.4.0 on DSpace Test:</li>
<li><p>Update RVM&rsquo;s Ruby from 2.3.0 to 2.4.0 on DSpace Test:</p> </ul>
<pre><code>$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 <pre><code>$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
$ \curl -sSL https://raw.githubusercontent.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable --ruby $ \curl -sSL https://raw.githubusercontent.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable --ruby
... reload shell to get new Ruby ... reload shell to get new Ruby
$ gem install sass -v 3.3.14 $ gem install sass -v 3.3.14
$ gem install compass -v 1.0.3 $ gem install compass -v 1.0.3
</code></pre></li> </code></pre><ul>
<li>Help Tsega re-deploy the ckm-cgspace-rest-api on DSpace Test</li>
<li><p>Help Tsega re-deploy the ckm-cgspace-rest-api on DSpace Test</p></li>
</ul> </ul>

View File

@ -6,7 +6,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="May, 2017" /> <meta property="og:title" content="May, 2017" />
<meta property="og:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace." /> <meta property="og:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&#39;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&#39;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace." />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-05/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-05/" />
<meta property="article:published_time" content="2017-05-01T16:21:52+02:00" /> <meta property="article:published_time" content="2017-05-01T16:21:52+02:00" />
@ -14,8 +14,8 @@
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="May, 2017"/> <meta name="twitter:title" content="May, 2017"/>
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/> <meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&#39;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&#39;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -96,141 +96,104 @@
</p> </p>
</header> </header>
<h2 id="20170501">2017-05-01</h2>
<h2 id="2017-05-01">2017-05-01</h2>
<ul> <ul>
<li>ICARDA apparently started working on CG Core on their MEL repository</li> <li>ICARDA apparently started working on CG Core on their MEL repository</li>
<li>They have done a few <code>cg.*</code> fields, but not very consistent and even copy some of CGSpace items: <li>They have done a few <code>cg.*</code> fields, but not very consistent and even copy some of CGSpace items:
<ul> <ul>
<li><a href="https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full">https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full</a></li> <li><a href="https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full">https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/73683">https://cgspace.cgiar.org/handle/10568/73683</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/73683">https://cgspace.cgiar.org/handle/10568/73683</a></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2017-05-02">2017-05-02</h2> </ul>
<h2 id="20170502">2017-05-02</h2>
<ul> <ul>
<li>Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request</li> <li>Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request</li>
</ul> </ul>
<h2 id="20170504">2017-05-04</h2>
<h2 id="2017-05-04">2017-05-04</h2>
<ul> <ul>
<li>Sync DSpace Test with database and assetstore from CGSpace</li> <li>Sync DSpace Test with database and assetstore from CGSpace</li>
<li>Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server</li> <li>Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server</li>
<li>Now I can see the workflow statistics and am able to select users, but everything returns 0 items</li> <li>Now I can see the workflow statistics and am able to select users, but everything returns 0 items</li>
<li>Megan says there are still some mapped items are not appearing since last week, so I forced a full <code>index-discovery -b</code></li> <li>Megan says there are still some mapped items are not appearing since last week, so I forced a full <code>index-discovery -b</code></li>
<li>Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: <a href="https://cgspace.cgiar.org/handle/10568/80731">https://cgspace.cgiar.org/handle/10568/80731</a></li> <li>Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: <a href="https://cgspace.cgiar.org/handle/10568/80731">https://cgspace.cgiar.org/handle/10568/80731</a></li>
</ul> </ul>
<h2 id="20170505">2017-05-05</h2>
<h2 id="2017-05-05">2017-05-05</h2>
<ul> <ul>
<li>Discovered that CGSpace has ~700 items that are missing the <code>cg.identifier.status</code> field</li> <li>Discovered that CGSpace has ~700 items that are missing the <code>cg.identifier.status</code> field</li>
<li>Need to perhaps try using the &ldquo;required metadata&rdquo; curation task to find fields missing these items:</li>
<li><p>Need to perhaps try using the &ldquo;required metadata&rdquo; curation task to find fields missing these items:</p>
<pre><code>$ [dspace]/bin/dspace curate -t requiredmetadata -i 10568/1 -r - &gt; /tmp/curation.out
</code></pre></li>
<li><p>It seems the curation task dies when it finds an item which has missing metadata</p></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace curate -t requiredmetadata -i 10568/1 -r - &gt; /tmp/curation.out
<h2 id="2017-05-06">2017-05-06</h2> </code></pre><ul>
<li>It seems the curation task dies when it finds an item which has missing metadata</li>
</ul>
<h2 id="20170506">2017-05-06</h2>
<ul> <ul>
<li>Add &ldquo;Blog Post&rdquo; to <code>dc.type</code></li> <li>Add &ldquo;Blog Post&rdquo; to <code>dc.type</code></li>
<li>Create ticket on Atmire tracker to ask about commissioning them to develop the feature to expose ORCID via REST/OAI: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510</a></li> <li>Create ticket on Atmire tracker to ask about commissioning them to develop the feature to expose ORCID via REST/OAI: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510</a></li>
<li>According to the <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System">DSpace curation docs</a> the fact that the <code>requiredmetadata</code> curation task stops when it finds a missing metadata field is by design</li> <li>According to the <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System">DSpace curation docs</a> the fact that the <code>requiredmetadata</code> curation task stops when it finds a missing metadata field is by design</li>
</ul> </ul>
<h2 id="20170507">2017-05-07</h2>
<h2 id="2017-05-07">2017-05-07</h2>
<ul> <ul>
<li><p>Testing one replacement for CCAFS Flagships (<code>cg.subject.ccafs</code>), first changed in the submission forms, and then in the database:</p> <li>Testing one replacement for CCAFS Flagships (<code>cg.subject.ccafs</code>), first changed in the submission forms, and then in the database:</li>
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
</code></pre></li>
<li><p>Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones</p></li>
<li><p>Waiting for feedback from CCAFS, then I can merge <a href="https://github.com/ilri/DSpace/pull/320">#320</a></p></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
<h2 id="2017-05-08">2017-05-08</h2> </code></pre><ul>
<li>Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones</li>
<li>Waiting for feedback from CCAFS, then I can merge <a href="https://github.com/ilri/DSpace/pull/320">#320</a></li>
</ul>
<h2 id="20170508">2017-05-08</h2>
<ul> <ul>
<li>Start working on CGIAR Library migration</li> <li>Start working on CGIAR Library migration</li>
<li>We decided to use AIP export to preserve the hierarchies and handles of communities and collections</li> <li>We decided to use AIP export to preserve the hierarchies and handles of communities and collections</li>
<li>When ingesting some collections I was getting <code>java.lang.OutOfMemoryError: GC overhead limit exceeded</code>, which can be solved by disabling the GC timeout with <code>-XX:-UseGCOverheadLimit</code></li> <li>When ingesting some collections I was getting <code>java.lang.OutOfMemoryError: GC overhead limit exceeded</code>, which can be solved by disabling the GC timeout with <code>-XX:-UseGCOverheadLimit</code></li>
<li>Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed</li> <li>Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed</li>
<li>This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using <code>dspace cleanup -v</code>, or else you&rsquo;ll run out of disk space</li> <li>This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using <code>dspace cleanup -v</code>, or else you'll run out of disk space</li>
<li>In the end I realized it's better to use submission mode (<code>-s</code>) to ingest the community object as a single AIP without its children, followed by each of the collections:</li>
<li><p>In the end I realized it&rsquo;s better to use submission mode (<code>-s</code>) to ingest the community object as a single AIP without its children, followed by each of the collections:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit&quot;
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip $ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done $ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
</code></pre></li> </code></pre><ul>
<li>Note that in submission mode DSpace ignores the handle specified in <code>mets.xml</code> in the zip file, so you need to turn that off with <code>-o ignoreHandle=false</code></li>
<li><p>Note that in submission mode DSpace ignores the handle specified in <code>mets.xml</code> in the zip file, so you need to turn that off with <code>-o ignoreHandle=false</code></p></li> <li>The <code>-u</code> option supresses prompts, to allow the process to run without user input</li>
<li>Give feedback to CIFOR about their data quality:
<li><p>The <code>-u</code> option supresses prompts, to allow the process to run without user input</p></li>
<li><p>Give feedback to CIFOR about their data quality:</p>
<ul> <ul>
<li>Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier</li> <li>Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier</li>
<li>Suggestion: use CGSpace&rsquo;s CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml</li> <li>Suggestion: use CGSpace's CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml</li>
<li>Suggestion: clean up duplicates and errors in funders, perhaps use a controlled vocabulary like ours, see: dspace/config/controlled-vocabularies/dc-description-sponsorship.xml</li> <li>Suggestion: clean up duplicates and errors in funders, perhaps use a controlled vocabulary like ours, see: dspace/config/controlled-vocabularies/dc-description-sponsorship.xml</li>
<li>Suggestion: use dc.type &ldquo;Blog Post&rdquo; instead of &ldquo;Blog&rdquo; for your blog post items (we are also adding a &ldquo;Blog Post&rdquo; type to CGSpace soon)</li> <li>Suggestion: use dc.type &ldquo;Blog Post&rdquo; instead of &ldquo;Blog&rdquo; for your blog post items (we are also adding a &ldquo;Blog Post&rdquo; type to CGSpace soon)</li>
<li>Question: many of your items use dc.document.uri AND cg.identifier.url with the same text value?</li> <li>Question: many of your items use dc.document.uri AND cg.identifier.url with the same text value?</li>
</ul></li>
<li><p>Help Marianne from WLE with an Open Search query to show the latest WLE CRP outputs: <a href="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&amp;sort_by=2&amp;order=DESC">https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&amp;sort_by=2&amp;order=DESC</a></p></li>
<li><p>This uses the webui&rsquo;s item list sort options, see <code>webui.itemlist.sort-option</code> in <code>dspace.cfg</code></p></li>
<li><p>The equivalent Discovery search would be: <a href="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&amp;submit_apply_filter=&amp;query=&amp;rpp=10&amp;sort_by=dc.date.issued_dt&amp;order=desc">https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&amp;submit_apply_filter=&amp;query=&amp;rpp=10&amp;sort_by=dc.date.issued_dt&amp;order=desc</a></p></li>
</ul> </ul>
</li>
<h2 id="2017-05-09">2017-05-09</h2> <li>Help Marianne from WLE with an Open Search query to show the latest WLE CRP outputs: <a href="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&amp;sort_by=2&amp;order=DESC">https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&amp;sort_by=2&amp;order=DESC</a></li>
<li>This uses the webui's item list sort options, see <code>webui.itemlist.sort-option</code> in <code>dspace.cfg</code></li>
<li>The equivalent Discovery search would be: <a href="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&amp;submit_apply_filter=&amp;query=&amp;rpp=10&amp;sort_by=dc.date.issued_dt&amp;order=desc">https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&amp;submit_apply_filter=&amp;query=&amp;rpp=10&amp;sort_by=dc.date.issued_dt&amp;order=desc</a></li>
</ul>
<h2 id="20170509">2017-05-09</h2>
<ul> <ul>
<li>The CGIAR Library metadata has some blank metadata values, which leads to <code>|||</code> in the Discovery facets</li> <li>The CGIAR Library metadata has some blank metadata values, which leads to <code>|||</code> in the Discovery facets</li>
<li>Clean these up in the database using:</li>
<li><p>Clean these up in the database using:</p>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
</code></pre></li>
<li><p>I ended up running into issues during data cleaning and decided to wipe out the entire community and re-sync DSpace Test assetstore and database from CGSpace rather than waiting for the cleanup task to clean up</p></li>
<li><p>Hours into the re-ingestion I ran into more errors, and had to erase everything and start over <em>again</em>!</p></li>
<li><p>Now, no matter what I do I keep getting foreign key errors&hellip;</p>
<pre><code>Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint &quot;handle_pkey&quot;
Detail: Key (handle_id)=(80928) already exists.
</code></pre></li>
<li><p>I think those errors actually come from me running the <code>update-sequences.sql</code> script while Tomcat/DSpace are running</p></li>
<li><p>Apparently you need to stop Tomcat!</p></li>
</ul> </ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
<h2 id="2017-05-10">2017-05-10</h2> </code></pre><ul>
<li>I ended up running into issues during data cleaning and decided to wipe out the entire community and re-sync DSpace Test assetstore and database from CGSpace rather than waiting for the cleanup task to clean up</li>
<li>Hours into the re-ingestion I ran into more errors, and had to erase everything and start over <em>again</em>!</li>
<li>Now, no matter what I do I keep getting foreign key errors&hellip;</li>
</ul>
<pre><code>Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint &quot;handle_pkey&quot;
Detail: Key (handle_id)=(80928) already exists.
</code></pre><ul>
<li>I think those errors actually come from me running the <code>update-sequences.sql</code> script while Tomcat/DSpace are running</li>
<li>Apparently you need to stop Tomcat!</li>
</ul>
<h2 id="20170510">2017-05-10</h2>
<ul> <ul>
<li>Atmire says they are willing to extend the ORCID implementation, and I&rsquo;ve asked them to provide a quote</li> <li>Atmire says they are willing to extend the ORCID implementation, and I've asked them to provide a quote</li>
<li>I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields</li> <li>I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields</li>
<li>Finally finished importing all the CGIAR Library content, final method was:</li>
<li><p>Finally finished importing all the CGIAR Library content, final method was:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit&quot;
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2517/10947-2517.zip $ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2517/10947-2517.zip
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2515/10947-2515.zip $ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2515/10947-2515.zip
@ -238,119 +201,95 @@ $ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@
$ [dspace]/bin/dspace packager -s -t AIP -o ignoreHandle=false -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip $ [dspace]/bin/dspace packager -s -t AIP -o ignoreHandle=false -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done $ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
</code></pre></li> </code></pre><ul>
<li>Basically, import the smaller communities using recursive AIP import (with <code>skipIfParentMissing</code>)</li>
<li><p>Basically, import the smaller communities using recursive AIP import (with <code>skipIfParentMissing</code>)</p></li> <li>Then, for the larger collection, create the community, collections, and items separately, ingesting the items one by one</li>
<li>The <code>-XX:-UseGCOverheadLimit</code> JVM option helps with some issues in large imports</li>
<li><p>Then, for the larger collection, create the community, collections, and items separately, ingesting the items one by one</p></li> <li>After this I ran the <code>update-sequences.sql</code> script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:</li>
<li><p>The <code>-XX:-UseGCOverheadLimit</code> JVM option helps with some issues in large imports</p></li>
<li><p>After this I ran the <code>update-sequences.sql</code> script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:</p>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
</code></pre></li>
</ul> </ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
<h2 id="2017-05-13">2017-05-13</h2> </code></pre><h2 id="20170513">2017-05-13</h2>
<ul> <ul>
<li>After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually <a href="https://en.wikipedia.org/wiki/Null_character">NUL</a> characters in the <code>dc.description.abstract</code> field (at least) on the lines where CSV importing was failing</li> <li>After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually <a href="https://en.wikipedia.org/wiki/Null_character">NUL</a> characters in the <code>dc.description.abstract</code> field (at least) on the lines where CSV importing was failing</li>
<li>I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it</li> <li>I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it</li>
<li>The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before</li> <li>The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before</li>
</ul> </ul>
<h2 id="20170515">2017-05-15</h2>
<h2 id="2017-05-15">2017-05-15</h2>
<ul> <ul>
<li>To delete the blank lines that cause isses during import we need to use a regex in vim <code>g/^$/d</code></li> <li>To delete the blank lines that cause isses during import we need to use a regex in vim <code>g/^$/d</code></li>
<li>After that I started looking in the <code>dc.subject</code> field to try to pull countries and regions out, but there are too many values in there</li> <li>After that I started looking in the <code>dc.subject</code> field to try to pull countries and regions out, but there are too many values in there</li>
<li>Bump the Academicons dependency of the Mirage 2 themes from 1.6.0 to 1.8.0 because the upstream deleted the old tag and now the build is failing: <a href="https://github.com/ilri/DSpace/pull/321">#321</a></li> <li>Bump the Academicons dependency of the Mirage 2 themes from 1.6.0 to 1.8.0 because the upstream deleted the old tag and now the build is failing: <a href="https://github.com/ilri/DSpace/pull/321">#321</a></li>
<li>Merge changes to CCAFS project identifiers and flagships: <a href="https://github.com/ilri/DSpace/pull/320">#320</a></li> <li>Merge changes to CCAFS project identifiers and flagships: <a href="https://github.com/ilri/DSpace/pull/320">#320</a></li>
<li>Run updates for CCAFS flagships on CGSpace:</li>
<li><p>Run updates for CCAFS flagships on CGSpace:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i /tmp/ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>
<li><p>These include:</p> <p>These include:</p>
<ul> <ul>
<li>GENDER AND SOCIAL DIFFERENTIATION→GENDER AND SOCIAL INCLUSION</li> <li>GENDER AND SOCIAL DIFFERENTIATION→GENDER AND SOCIAL INCLUSION</li>
<li>MANAGING CLIMATE RISK→CLIMATE SERVICES AND SAFETY NETS</li> <li>MANAGING CLIMATE RISK→CLIMATE SERVICES AND SAFETY NETS</li>
</ul></li>
<li><p>Re-deploy CGSpace and DSpace Test and run system updates</p></li>
<li><p>Reboot DSpace Test</p></li>
<li><p>Fix cron jobs for log management on DSpace Test, as they weren&rsquo;t catching <code>dspace.log.*</code> files correctly and we had over six months of them and they were taking up many gigs of disk space</p></li>
</ul> </ul>
</li>
<h2 id="2017-05-16">2017-05-16</h2> <li>
<p>Re-deploy CGSpace and DSpace Test and run system updates</p>
</li>
<li>
<p>Reboot DSpace Test</p>
</li>
<li>
<p>Fix cron jobs for log management on DSpace Test, as they weren't catching <code>dspace.log.*</code> files correctly and we had over six months of them and they were taking up many gigs of disk space</p>
</li>
</ul>
<h2 id="20170516">2017-05-16</h2>
<ul> <ul>
<li>Discuss updates to WLE themes for their Phase II</li> <li>Discuss updates to WLE themes for their Phase II</li>
<li>Make an issue to track the changes to <code>cg.subject.wle</code>: <a href="https://github.com/ilri/DSpace/issues/322">#322</a></li> <li>Make an issue to track the changes to <code>cg.subject.wle</code>: <a href="https://github.com/ilri/DSpace/issues/322">#322</a></li>
</ul> </ul>
<h2 id="20170517">2017-05-17</h2>
<h2 id="2017-05-17">2017-05-17</h2>
<ul> <ul>
<li><p>Looking into the error I get when trying to create a new collection on DSpace Test:</p> <li>Looking into the error I get when trying to create a new collection on DSpace Test:</li>
<pre><code>ERROR: duplicate key value violates unique constraint &quot;handle_pkey&quot; Detail: Key (handle_id)=(84834) already exists.
</code></pre></li>
<li><p>I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn&rsquo;t helped</p></li>
<li><p>It appears item with <code>handle_id</code> 84834 is one of the imported CGIAR Library items:</p>
<pre><code>dspace=# select * from handle where handle_id=84834;
handle_id | handle | resource_type_id | resource_id
-----------+------------+------------------+-------------
84834 | 10947/1332 | 2 | 87113
</code></pre></li>
<li><p>Looks like the max <code>handle_id</code> is actually much higher:</p>
<pre><code>dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
handle_id | handle | resource_type_id | resource_id
-----------+----------+------------------+-------------
86873 | 10947/99 | 2 | 89153
(1 row)
</code></pre></li>
<li><p>I&rsquo;ve posted on the dspace-test mailing list to see if I can just manually set the <code>handle_seq</code> to that value</p></li>
<li><p>Actually, it seems I can manually set the handle sequence using:</p>
<pre><code>dspace=# select setval('handle_seq',86873);
</code></pre></li>
<li><p>After that I can create collections just fine, though I&rsquo;m not sure if it has other side effects</p></li>
</ul> </ul>
<pre><code>ERROR: duplicate key value violates unique constraint &quot;handle_pkey&quot; Detail: Key (handle_id)=(84834) already exists.
<h2 id="2017-05-21">2017-05-21</h2> </code></pre><ul>
<li>I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn't helped</li>
<li>It appears item with <code>handle_id</code> 84834 is one of the imported CGIAR Library items:</li>
</ul>
<pre><code>dspace=# select * from handle where handle_id=84834;
handle_id | handle | resource_type_id | resource_id
-----------+------------+------------------+-------------
84834 | 10947/1332 | 2 | 87113
</code></pre><ul>
<li>Looks like the max <code>handle_id</code> is actually much higher:</li>
</ul>
<pre><code>dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
handle_id | handle | resource_type_id | resource_id
-----------+----------+------------------+-------------
86873 | 10947/99 | 2 | 89153
(1 row)
</code></pre><ul>
<li>I've posted on the dspace-test mailing list to see if I can just manually set the <code>handle_seq</code> to that value</li>
<li>Actually, it seems I can manually set the handle sequence using:</li>
</ul>
<pre><code>dspace=# select setval('handle_seq',86873);
</code></pre><ul>
<li>After that I can create collections just fine, though I'm not sure if it has other side effects</li>
</ul>
<h2 id="20170521">2017-05-21</h2>
<ul> <ul>
<li>Start creating a basic theme for the CGIAR System Organization&rsquo;s community on CGSpace</li> <li>Start creating a basic theme for the CGIAR System Organization's community on CGSpace</li>
<li>Using colors from the <a href="http://library.cgiar.org/handle/10947/2699">CGIAR Branding guidelines (2014)</a></li> <li>Using colors from the <a href="http://library.cgiar.org/handle/10947/2699">CGIAR Branding guidelines (2014)</a></li>
<li>Make a GitHub issue to track this work: <a href="https://github.com/ilri/DSpace/issues/324">#324</a></li> <li>Make a GitHub issue to track this work: <a href="https://github.com/ilri/DSpace/issues/324">#324</a></li>
</ul> </ul>
<h2 id="20170522">2017-05-22</h2>
<h2 id="2017-05-22">2017-05-22</h2>
<ul> <ul>
<li>Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested</li> <li>Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested</li>
<li>Peter wanted a list of authors in here, so I generated a list of collections using the &ldquo;View Source&rdquo; on each community and this hacky awk:</li>
<li><p>Peter wanted a list of authors in here, so I generated a list of collections using the &ldquo;View Source&rdquo; on each community and this hacky awk:</p> </ul>
<pre><code>$ grep 10947/ /tmp/collections | grep -v cocoon | awk -F/ '{print $3&quot;/&quot;$4}' | awk -F\&quot; '{print $1}' | vim - <pre><code>$ grep 10947/ /tmp/collections | grep -v cocoon | awk -F/ '{print $3&quot;/&quot;$4}' | awk -F\&quot; '{print $1}' | vim -
</code></pre></li> </code></pre><ul>
<li>Then I joined them together and ran this old SQL query from the dspace-tech mailing list which gives you authors for items in those collections:</li>
<li><p>Then I joined them together and ran this old SQL query from the dspace-tech mailing list which gives you authors for items in those collections:</p> </ul>
<pre><code>dspace=# select distinct text_value <pre><code>dspace=# select distinct text_value
from metadatavalue from metadatavalue
where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
@ -364,82 +303,62 @@ AND resource_id IN (select item_id from collection2item where collection_id IN (
47/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2 47/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2
531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535' 531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535'
, '10947/2537', '10568/93761'))); , '10947/2537', '10568/93761')));
</code></pre></li> </code></pre><ul>
<li>To get a CSV (with counts) from that:</li>
<li><p>To get a CSV (with counts) from that:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) <pre><code>dspace=# \copy (select distinct text_value, count(*)
from metadatavalue from metadatavalue
where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
AND resource_type_id = 2 AND resource_type_id = 2
AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/10', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '10947/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521', '10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '10947/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535', '10947/2537', '10568/93761'))) group by text_value order by count desc) to /tmp/cgiar-librar-authors.csv with csv; AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/10', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '10947/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521', '10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '10947/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535', '10947/2537', '10568/93761'))) group by text_value order by count desc) to /tmp/cgiar-librar-authors.csv with csv;
</code></pre></li> </code></pre><h2 id="20170523">2017-05-23</h2>
</ul>
<h2 id="2017-05-23">2017-05-23</h2>
<ul> <ul>
<li>Add Affiliation to filters on Listing and Reports module (<a href="https://github.com/ilri/DSpace/pull/325">#325</a>)</li> <li>Add Affiliation to filters on Listing and Reports module (<a href="https://github.com/ilri/DSpace/pull/325">#325</a>)</li>
<li>Start looking at WLE&rsquo;s Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!</li> <li>Start looking at WLE's Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!</li>
<li>For now I&rsquo;ve suggested that they just change the collection names and that we fix their metadata manually afterwards</li> <li>For now I've suggested that they just change the collection names and that we fix their metadata manually afterwards</li>
<li>Also, they have a lot of messed up values in their <code>cg.subject.wle</code> field so I will clean up some of those first:</li>
<li><p>Also, they have a lot of messed up values in their <code>cg.subject.wle</code> field so I will clean up some of those first:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id=119) to /tmp/wle.csv with csv; <pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id=119) to /tmp/wle.csv with csv;
COPY 111 COPY 111
</code></pre></li> </code></pre><ul>
<li>Respond to Atmire message about ORCIDs, saying that right now we'd prefer to just have them available via REST API like any other metadata field, and that I'm available for a Skype</li>
<li><p>Respond to Atmire message about ORCIDs, saying that right now we&rsquo;d prefer to just have them available via REST API like any other metadata field, and that I&rsquo;m available for a Skype</p></li>
</ul> </ul>
<h2 id="20170526">2017-05-26</h2>
<h2 id="2017-05-26">2017-05-26</h2>
<ul> <ul>
<li>Increase max file size in nginx so that CIP can upload some larger PDFs</li> <li>Increase max file size in nginx so that CIP can upload some larger PDFs</li>
<li>Agree to talk with Atmire after the June DSpace developers meeting where they will be discussing exposing ORCIDs via REST/OAI</li> <li>Agree to talk with Atmire after the June DSpace developers meeting where they will be discussing exposing ORCIDs via REST/OAI</li>
</ul> </ul>
<h2 id="20170528">2017-05-28</h2>
<h2 id="2017-05-28">2017-05-28</h2>
<ul> <ul>
<li>File an issue on GitHub to explore/track migration to proper country/region codes (ISO <sup>2</sup>&frasl;<sub>3</sub> and UN M.49): <a href="https://github.com/ilri/DSpace/issues/326">#326</a></li> <li>File an issue on GitHub to explore/track migration to proper country/region codes (ISO 2/3 and UN M.49): <a href="https://github.com/ilri/DSpace/issues/326">#326</a></li>
<li>Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website</li> <li>Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website</li>
<li>Communicate with MARLO people about progress on exposing ORCIDs via the REST API, as it is set to be discussed in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li> <li>Communicate with MARLO people about progress on exposing ORCIDs via the REST API, as it is set to be discussed in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li>
<li>Find all of Amos Omore's author name variations so I can link them to his authority entry that has an ORCID:</li>
<li><p>Find all of Amos Omore&rsquo;s author name variations so I can link them to his authority entry that has an ORCID:</p> </ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Omore, A%'; <pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Omore, A%';
</code></pre></li> </code></pre><ul>
<li>Set the authority for all variations to one containing an ORCID:</li>
<li><p>Set the authority for all variations to one containing an ORCID:</p> </ul>
<pre><code>dspace=# update metadatavalue set authority='4428ee88-90ef-4107-b837-3c0ec988520b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Omore, A%'; <pre><code>dspace=# update metadatavalue set authority='4428ee88-90ef-4107-b837-3c0ec988520b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Omore, A%';
UPDATE 187 UPDATE 187
</code></pre></li> </code></pre><ul>
<li>Next I need to do Edgar Twine:</li>
<li><p>Next I need to do Edgar Twine:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Twine, E%';
</code></pre></li>
<li><p>But it doesn&rsquo;t look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via &ldquo;Edit this Item&rdquo; and looked up his ORCID and linked it there</p></li>
<li><p>Now I should be able to set his name variations to the new authority:</p>
<pre><code>dspace=# update metadatavalue set authority='f70d0a01-d562-45b8-bca3-9cf7f249bc8b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Twine, E%';
</code></pre></li>
<li><p>Run the corrections on CGSpace and then update discovery / authority</p></li>
<li><p>I notice that there are a handful of <code>java.lang.OutOfMemoryError: Java heap space</code> errors in the Catalina logs on CGSpace, I should go look into that&hellip;</p></li>
</ul> </ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Twine, E%';
<h2 id="2017-05-29">2017-05-29</h2> </code></pre><ul>
<li>But it doesn't look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via &ldquo;Edit this Item&rdquo; and looked up his ORCID and linked it there</li>
<li>Now I should be able to set his name variations to the new authority:</li>
</ul>
<pre><code>dspace=# update metadatavalue set authority='f70d0a01-d562-45b8-bca3-9cf7f249bc8b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Twine, E%';
</code></pre><ul>
<li>Run the corrections on CGSpace and then update discovery / authority</li>
<li>I notice that there are a handful of <code>java.lang.OutOfMemoryError: Java heap space</code> errors in the Catalina logs on CGSpace, I should go look into that&hellip;</li>
</ul>
<h2 id="20170529">2017-05-29</h2>
<ul> <ul>
<li>Discuss WLE themes and subjects with Mia and Macaroni Bros</li> <li>Discuss WLE themes and subjects with Mia and Macaroni Bros</li>
<li>We decided we need to create metadata fields for Phase I and II themes</li> <li>We decided we need to create metadata fields for Phase I and II themes</li>
<li>I&rsquo;ve updated the existing GitHub issue for Phase II (<a href="https://github.com/ilri/DSpace/issues/322">#322</a>) and created a new one to track the changes for Phase I themes (<a href="https://github.com/ilri/DSpace/issues/327">#327</a>)</li> <li>I've updated the existing GitHub issue for Phase II (<a href="https://github.com/ilri/DSpace/issues/322">#322</a>) and created a new one to track the changes for Phase I themes (<a href="https://github.com/ilri/DSpace/issues/327">#327</a>)</li>
<li>After Macaroni Bros update the WLE website importer we will rename the WLE collections to reflect Phase II</li> <li>After Macaroni Bros update the WLE website importer we will rename the WLE collections to reflect Phase II</li>
<li>Also, we need to have Mia and Udana look through the existing metadata in <code>cg.subject.wle</code> as it is quite a mess</li> <li>Also, we need to have Mia and Udana look through the existing metadata in <code>cg.subject.wle</code> as it is quite a mess</li>
</ul> </ul>

View File

@ -6,7 +6,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="June, 2017" /> <meta property="og:title" content="June, 2017" />
<meta property="og:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg." /> <meta property="og:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&#39;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg." />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-06/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-06/" />
<meta property="article:published_time" content="2017-06-01T10:14:52+03:00" /> <meta property="article:published_time" content="2017-06-01T10:14:52+03:00" />
@ -14,8 +14,8 @@
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="June, 2017"/> <meta name="twitter:title" content="June, 2017"/>
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/> <meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&#39;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -96,83 +96,69 @@
</p> </p>
</header> </header>
<h2 id="20170601">2017-06-01</h2>
<h2 id="2017-06-01">2017-06-01</h2>
<ul> <ul>
<li>After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes</li> <li>After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes</li>
<li>The <code>cg.identifier.wletheme</code> field will be used for both Phase I and Phase II Research Themes</li> <li>The <code>cg.identifier.wletheme</code> field will be used for both Phase I and Phase II Research Themes</li>
<li>Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there</li> <li>Then we'll create a new sub-community for Phase II and create collections for the research themes there</li>
<li>The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo;</li> <li>The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo;</li>
<li>Tagged all items in the current Phase I collections with their appropriate themes</li> <li>Tagged all items in the current Phase I collections with their appropriate themes</li>
<li>Create pull request to add Phase II research themes to the submission form: <a href="https://github.com/ilri/DSpace/pull/328">#328</a></li> <li>Create pull request to add Phase II research themes to the submission form: <a href="https://github.com/ilri/DSpace/pull/328">#328</a></li>
<li>Add <code>cg.subject.system</code> to CGSpace metadata registry, for subject from the upcoming CGIAR Library migration</li> <li>Add <code>cg.subject.system</code> to CGSpace metadata registry, for subject from the upcoming CGIAR Library migration</li>
</ul> </ul>
<h2 id="20170604">2017-06-04</h2>
<h2 id="2017-06-04">2017-06-04</h2>
<ul> <ul>
<li>After adding <code>cg.identifier.wletheme</code> to 1106 WLE items I can see the field on XMLUI but not in REST!</li> <li>After adding <code>cg.identifier.wletheme</code> to 1106 WLE items I can see the field on XMLUI but not in REST!</li>
<li>Strangely it happens on DSpace Test AND on CGSpace!</li> <li>Strangely it happens on DSpace Test AND on CGSpace!</li>
<li>I tried to re-index Discovery but it didn&rsquo;t fix it</li> <li>I tried to re-index Discovery but it didn't fix it</li>
<li>Run all system updates on DSpace Test and reboot the server</li> <li>Run all system updates on DSpace Test and reboot the server</li>
<li>After rebooting the server (and therefore restarting Tomcat) the new metadata field is available</li> <li>After rebooting the server (and therefore restarting Tomcat) the new metadata field is available</li>
<li>I&rsquo;ve sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket</li> <li>I've sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket</li>
</ul> </ul>
<h2 id="20160605">2016-06-05</h2>
<h2 id="2016-06-05">2016-06-05</h2>
<ul> <ul>
<li>Rename WLE&rsquo;s &ldquo;Research Themes&rdquo; sub-community to &ldquo;WLE Phase I Research Themes&rdquo; on DSpace Test so Macaroni Bros can continue their testing</li> <li>Rename WLE's &ldquo;Research Themes&rdquo; sub-community to &ldquo;WLE Phase I Research Themes&rdquo; on DSpace Test so Macaroni Bros can continue their testing</li>
<li>Macaroni Bros tested it and said it&rsquo;s fine, so I renamed it on CGSpace as well</li> <li>Macaroni Bros tested it and said it's fine, so I renamed it on CGSpace as well</li>
<li>Working on how to automate the extraction of the CIAT Book chapters, doing some magic in OpenRefine to extract page fromto from cg.identifier.url and dc.format.extent, respectively: <li>Working on how to automate the extraction of the CIAT Book chapters, doing some magic in OpenRefine to extract page fromto from cg.identifier.url and dc.format.extent, respectively:
<ul> <ul>
<li>cg.identifier.url: <code>value.split(&quot;page=&quot;, &quot;&quot;)[1]</code></li> <li>cg.identifier.url: <code>value.split(&quot;page=&quot;, &quot;&quot;)[1]</code></li>
<li>dc.format.extent: <code>value.replace(&quot;p. &quot;, &quot;&quot;).split(&quot;-&quot;)[1].toNumber() - value.replace(&quot;p. &quot;, &quot;&quot;).split(&quot;-&quot;)[0].toNumber()</code></li> <li>dc.format.extent: <code>value.replace(&quot;p. &quot;, &quot;&quot;).split(&quot;-&quot;)[1].toNumber() - value.replace(&quot;p. &quot;, &quot;&quot;).split(&quot;-&quot;)[0].toNumber()</code></li>
</ul></li> </ul>
<li>Finally, after some filtering to see which small outliers there were (based on dc.format.extent using &ldquo;p. 1-14&rdquo; vs &ldquo;29 p.&rdquo;), create a new column with last page number: </li>
<li>Finally, after some filtering to see which small outliers there were (based on dc.format.extent using &ldquo;p. 1-14&rdquo; vs &ldquo;29 p.&quot;), create a new column with last page number:
<ul> <ul>
<li><code>cells[&quot;dc.page.from&quot;].value.toNumber() + cells[&quot;dc.format.pages&quot;].value.toNumber()</code></li> <li><code>cells[&quot;dc.page.from&quot;].value.toNumber() + cells[&quot;dc.format.pages&quot;].value.toNumber()</code></li>
</ul></li> </ul>
</li>
<li>Then create a new, unique file name to be used in the output, based on a SHA1 of the dc.title and with a description: <li>Then create a new, unique file name to be used in the output, based on a SHA1 of the dc.title and with a description:
<ul> <ul>
<li>dc.page.to: <code>value.split(&quot; &quot;)[0].replace(&quot;,&quot;,&quot;&quot;).toLowercase() + &quot;-&quot; + sha1(value).get(1,9) + &quot;.pdf__description:&quot; + cells[&quot;dc.type&quot;].value</code></li> <li>dc.page.to: <code>value.split(&quot; &quot;)[0].replace(&quot;,&quot;,&quot;&quot;).toLowercase() + &quot;-&quot; + sha1(value).get(1,9) + &quot;.pdf__description:&quot; + cells[&quot;dc.type&quot;].value</code></li>
</ul></li> </ul>
</li>
<li>Start processing 769 records after filtering the following (there are another 159 records that have some other format, or for example they have their own PDF which I will process later), using a modified <code>generate-thumbnails.py</code> script to read certain fields and then pass to GhostScript: <li>Start processing 769 records after filtering the following (there are another 159 records that have some other format, or for example they have their own PDF which I will process later), using a modified <code>generate-thumbnails.py</code> script to read certain fields and then pass to GhostScript:
<ul> <ul>
<li>cg.identifier.url: <code>value.contains(&quot;page=&quot;)</code></li> <li>cg.identifier.url: <code>value.contains(&quot;page=&quot;)</code></li>
<li>dc.format.extent: <code>or(value.contains(&quot;p. &quot;),value.contains(&quot; p.&quot;))</code></li> <li>dc.format.extent: <code>or(value.contains(&quot;p. &quot;),value.contains(&quot; p.&quot;))</code></li>
<li>Command like: <code>$ gs -dNOPAUSE -dBATCH -dFirstPage=14 -dLastPage=27 -sDEVICE=pdfwrite -sOutputFile=beans.pdf -f 12605-1.pdf</code></li> <li>Command like: <code>$ gs -dNOPAUSE -dBATCH -dFirstPage=14 -dLastPage=27 -sDEVICE=pdfwrite -sOutputFile=beans.pdf -f 12605-1.pdf</code></li>
</ul></li>
<li>17 of the items have issues with incorrect page number ranges, and upon closer inspection they do not appear in the referenced PDF</li>
<li><p>I&rsquo;ve flagged them and proceeded without them (752 total) on DSpace Test:</p>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &amp;&gt; /tmp/ciat-books.log
</code></pre></li>
<li><p>I went and did some basic sanity checks on the remaining items in the CIAT Book Chapters and decided they are mostly fine (except one duplicate and the flagged ones), so I imported them to DSpace Test too (162 items)</p></li>
<li><p>Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT</p></li>
<li><p>Restart Tomcat on CGSpace so that the <code>cg.identifier.wletheme</code> field is available on REST API for Macaroni Bros</p></li>
</ul> </ul>
</li>
<h2 id="2017-06-07">2017-06-07</h2> <li>17 of the items have issues with incorrect page number ranges, and upon closer inspection they do not appear in the referenced PDF</li>
<li>I've flagged them and proceeded without them (752 total) on DSpace Test:</li>
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &amp;&gt; /tmp/ciat-books.log
</code></pre><ul>
<li>I went and did some basic sanity checks on the remaining items in the CIAT Book Chapters and decided they are mostly fine (except one duplicate and the flagged ones), so I imported them to DSpace Test too (162 items)</li>
<li>Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT</li>
<li>Restart Tomcat on CGSpace so that the <code>cg.identifier.wletheme</code> field is available on REST API for Macaroni Bros</li>
</ul>
<h2 id="20170607">2017-06-07</h2>
<ul> <ul>
<li>Testing <a href="https://github.com/ilri/DSpace/pull/319">Atmire&rsquo;s patch for the CUA Workflow Statistics again</a></li> <li>Testing <a href="https://github.com/ilri/DSpace/pull/319">Atmire's patch for the CUA Workflow Statistics again</a></li>
<li>Still doesn&rsquo;t seem to give results I&rsquo;d expect, like there are no results for Maria Garruccio, or for the ILRI community!</li> <li>Still doesn't seem to give results I'd expect, like there are no results for Maria Garruccio, or for the ILRI community!</li>
<li>Then I&rsquo;ll file an update to the issue on Atmire&rsquo;s tracker</li> <li>Then I'll file an update to the issue on Atmire's tracker</li>
<li>Created a new branch with just the relevant changes, so I can send it to them</li> <li>Created a new branch with just the relevant changes, so I can send it to them</li>
<li>One thing I noticed is that there is a failed database migration related to CUA:</li>
<li><p>One thing I noticed is that there is a failed database migration related to CUA:</p> </ul>
<pre><code>+----------------+----------------------------+---------------------+---------+ <pre><code>+----------------+----------------------------+---------------------+---------+
| Version | Description | Installed on | State | | Version | Description | Installed on | State |
+----------------+----------------------------+---------------------+---------+ +----------------+----------------------------+---------------------+---------+
@ -197,85 +183,64 @@
| 5.5.2015.12.03 | Atmire MQM migration | 2016-11-27 06:39:06 | OutOrde | | 5.5.2015.12.03 | Atmire MQM migration | 2016-11-27 06:39:06 | OutOrde |
| 5.6.2016.08.08 | CUA emailreport migration | 2017-01-29 11:18:56 | OutOrde | | 5.6.2016.08.08 | CUA emailreport migration | 2017-01-29 11:18:56 | OutOrde |
+----------------+----------------------------+---------------------+---------+ +----------------+----------------------------+---------------------+---------+
</code></pre></li> </code></pre><ul>
<li>Merge the pull request for <a href="https://github.com/ilri/DSpace/pull/328">WLE Phase II themes</a></li>
<li><p>Merge the pull request for <a href="https://github.com/ilri/DSpace/pull/328">WLE Phase II themes</a></p></li>
</ul> </ul>
<h2 id="20170618">2017-06-18</h2>
<h2 id="2017-06-18">2017-06-18</h2>
<ul> <ul>
<li>Redeploy CGSpace with latest changes from <code>5_x-prod</code>, run system updates, and reboot the server</li> <li>Redeploy CGSpace with latest changes from <code>5_x-prod</code>, run system updates, and reboot the server</li>
<li>Continue working on ansible infrastructure changes for CGIAR Library</li> <li>Continue working on ansible infrastructure changes for CGIAR Library</li>
</ul> </ul>
<h2 id="20170620">2017-06-20</h2>
<h2 id="2017-06-20">2017-06-20</h2>
<ul> <ul>
<li>Import Abenet and Peter&rsquo;s changes to the CGIAR Library CRP community</li> <li>Import Abenet and Peter's changes to the CGIAR Library CRP community</li>
<li>Due to them using Windows and renaming some columns there were formatting, encoding, and duplicate metadata value issues</li> <li>Due to them using Windows and renaming some columns there were formatting, encoding, and duplicate metadata value issues</li>
<li>I had to remove some fields from the CSV and rename some back to, ie, <code>dc.subject[en_US]</code> just so DSpace would detect changes properly</li> <li>I had to remove some fields from the CSV and rename some back to, ie, <code>dc.subject[en_US]</code> just so DSpace would detect changes properly</li>
<li>Now it looks much better: <a href="https://dspacetest.cgiar.org/handle/10947/2517">https://dspacetest.cgiar.org/handle/10947/2517</a></li> <li>Now it looks much better: <a href="https://dspacetest.cgiar.org/handle/10947/2517">https://dspacetest.cgiar.org/handle/10947/2517</a></li>
<li>Removing the HTML tags and HTML/XML entities using the following GREL: <li>Removing the HTML tags and HTML/XML entities using the following GREL:
<ul> <ul>
<li><code>replace(value,/&lt;\/?\w+((\s+\w+(\s*=\s*(?:&quot;.*?&quot;|'.*?'|[^'&quot;&gt;\s]+))?)+\s*|\s*)\/?&gt;/,'')</code></li> <li><code>replace(value,/&lt;\/?\w+((\s+\w+(\s*=\s*(?:&quot;.*?&quot;|'.*?'|[^'&quot;&gt;\s]+))?)+\s*|\s*)\/?&gt;/,'')</code></li>
<li><code>value.unescape(&quot;html&quot;).unescape(&quot;xml&quot;)</code></li> <li><code>value.unescape(&quot;html&quot;).unescape(&quot;xml&quot;)</code></li>
</ul></li> </ul>
</li>
<li><p>Finally import 914 CIAT Book Chapters to CGSpace in two batches:</p> <li>Finally import 914 CIAT Book Chapters to CGSpace in two batches:</li>
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &amp;&gt; /tmp/ciat-books.log <pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &amp;&gt; /tmp/ciat-books.log
$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &amp;&gt; /tmp/ciat-books2.log $ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &amp;&gt; /tmp/ciat-books2.log
</code></pre></li> </code></pre><h2 id="20170625">2017-06-25</h2>
</ul>
<h2 id="2017-06-25">2017-06-25</h2>
<ul> <ul>
<li>WLE has said that one of their Phase II research themes is being renamed from <code>Regenerating Degraded Landscapes</code> to <code>Restoring Degraded Landscapes</code></li> <li>WLE has said that one of their Phase II research themes is being renamed from <code>Regenerating Degraded Landscapes</code> to <code>Restoring Degraded Landscapes</code></li>
<li>Pull request with the changes to <code>input-forms.xml</code>: <a href="https://github.com/ilri/DSpace/pull/329">#329</a></li> <li>Pull request with the changes to <code>input-forms.xml</code>: <a href="https://github.com/ilri/DSpace/pull/329">#329</a></li>
<li>As of now it doesn't look like there are any items using this research theme so we don't need to do any updates:</li>
<li><p>As of now it doesn&rsquo;t look like there are any items using this research theme so we don&rsquo;t need to do any updates:</p> </ul>
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=237 and text_value like 'Regenerating Degraded Landscapes%'; <pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=237 and text_value like 'Regenerating Degraded Landscapes%';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form</li>
<li><p>Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form</p></li> <li>Perhaps we can add them together in the same question for <code>cg.identifier.wletheme</code></li>
<li><p>Perhaps we can add them together in the same question for <code>cg.identifier.wletheme</code></p></li>
</ul> </ul>
<h2 id="20170630">2017-06-30</h2>
<h2 id="2017-06-30">2017-06-30</h2>
<ul> <ul>
<li><p>CGSpace went down briefly, I see lots of these errors in the dspace logs:</p> <li>CGSpace went down briefly, I see lots of these errors in the dspace logs:</li>
</ul>
<pre><code>Java stacktrace: java.util.NoSuchElementException: Timeout waiting for idle object <pre><code>Java stacktrace: java.util.NoSuchElementException: Timeout waiting for idle object
</code></pre></li> </code></pre><ul>
<li>After looking at the Tomcat logs, Munin graphs, and PostgreSQL connection stats, it seems there is just a high load</li>
<li><p>After looking at the Tomcat logs, Munin graphs, and PostgreSQL connection stats, it seems there is just a high load</p></li> <li>Might be a good time to adjust DSpace's database connection settings, like I first mentioned in April, 2017 after reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">2017-04 DCAT comments</a></li>
<li>I've adjusted the following in CGSpace's config:
<li><p>Might be a good time to adjust DSpace&rsquo;s database connection settings, like I first mentioned in April, 2017 after reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">2017-04 DCAT comments</a></p></li>
<li><p>I&rsquo;ve adjusted the following in CGSpace&rsquo;s config:</p>
<ul> <ul>
<li><code>db.maxconnections</code> 30→70 (the default PostgreSQL config allows 100 connections, so DSpace&rsquo;s default of 30 is quite low)</li> <li><code>db.maxconnections</code> 30→70 (the default PostgreSQL config allows 100 connections, so DSpace's default of 30 is quite low)</li>
<li><code>db.maxwait</code> 5000→10000</li> <li><code>db.maxwait</code> 5000→10000</li>
<li><code>db.maxidle</code> 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)</li> <li><code>db.maxidle</code> 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)</li>
</ul></li>
<li><p>We will need to adjust this again (as well as the <code>pg_hba.conf</code> settings) when we deploy tsega&rsquo;s REST API</p></li>
<li><p>Whip up a test for Marianne of WLE to be able to show both their Phase I and II research themes in the CGSpace item submission form:</p></li>
</ul> </ul>
</li>
<p><img src="/cgspace-notes/2017/06/wle-theme-test-a.png" alt="Test A for displaying the Phase I and II research themes" /> <li>We will need to adjust this again (as well as the <code>pg_hba.conf</code> settings) when we deploy tsega's REST API</li>
<img src="/cgspace-notes/2017/06/wle-theme-test-b.png" alt="Test B for displaying the Phase I and II research themes" /></p> <li>Whip up a test for Marianne of WLE to be able to show both their Phase I and II research themes in the CGSpace item submission form:</li>
</ul>
<p><img src="/cgspace-notes/2017/06/wle-theme-test-a.png" alt="Test A for displaying the Phase I and II research themes">
<img src="/cgspace-notes/2017/06/wle-theme-test-b.png" alt="Test B for displaying the Phase I and II research themes"></p>

View File

@ -8,16 +8,13 @@
<meta property="og:title" content="July, 2017" /> <meta property="og:title" content="July, 2017" />
<meta property="og:description" content="2017-07-01 <meta property="og:description" content="2017-07-01
Run system updates and reboot DSpace Test Run system updates and reboot DSpace Test
2017-07-04 2017-07-04
Merge changes for WLE Phase II theme rename (#329) Merge changes for WLE Phase II theme rename (#329)
Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace
We can use PostgreSQL&rsquo;s extended output format (-x) plus sed to format the output into quasi XML: We can use PostgreSQL&#39;s extended output format (-x) plus sed to format the output into quasi XML:
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-07/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-07/" />
@ -28,18 +25,15 @@ We can use PostgreSQL&rsquo;s extended output format (-x) plus sed to format the
<meta name="twitter:title" content="July, 2017"/> <meta name="twitter:title" content="July, 2017"/>
<meta name="twitter:description" content="2017-07-01 <meta name="twitter:description" content="2017-07-01
Run system updates and reboot DSpace Test Run system updates and reboot DSpace Test
2017-07-04 2017-07-04
Merge changes for WLE Phase II theme rename (#329) Merge changes for WLE Phase II theme rename (#329)
Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace
We can use PostgreSQL&rsquo;s extended output format (-x) plus sed to format the output into quasi XML: We can use PostgreSQL&#39;s extended output format (-x) plus sed to format the output into quasi XML:
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -120,182 +114,138 @@ We can use PostgreSQL&rsquo;s extended output format (-x) plus sed to format the
</p> </p>
</header> </header>
<h2 id="2017-07-01">2017-07-01</h2> <h2 id="20170701">2017-07-01</h2>
<ul> <ul>
<li>Run system updates and reboot DSpace Test</li> <li>Run system updates and reboot DSpace Test</li>
</ul> </ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul> <ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> <li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul> </ul>
<pre><code>$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:&lt;/dc-type&gt;\n&lt;dc-type&gt;\n&lt;schema&gt;cg&lt;/schema&gt;:;s:([^ ]*) +\| (.*): &lt;\1&gt;\2&lt;/\1&gt;:;s:^$:&lt;/dc-type&gt;:;1s:&lt;/dc-type&gt;\n::' <pre><code>$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:&lt;/dc-type&gt;\n&lt;dc-type&gt;\n&lt;schema&gt;cg&lt;/schema&gt;:;s:([^ ]*) +\| (.*): &lt;\1&gt;\2&lt;/\1&gt;:;s:^$:&lt;/dc-type&gt;:;1s:&lt;/dc-type&gt;\n::'
</code></pre> </code></pre><ul>
<ul>
<li>The <code>sed</code> script is from a post on the <a href="https://www.postgresql.org/message-id/437E44A5.508%40ultimeth.com">PostgreSQL mailing list</a></li> <li>The <code>sed</code> script is from a post on the <a href="https://www.postgresql.org/message-id/437E44A5.508%40ultimeth.com">PostgreSQL mailing list</a></li>
<li>Abenet says the ILRI board wants to be able to have &ldquo;lead author&rdquo; for every item, so I&rsquo;ve whipped up a WIP test in the <code>5_x-lead-author</code> branch</li> <li>Abenet says the ILRI board wants to be able to have &ldquo;lead author&rdquo; for every item, so I've whipped up a WIP test in the <code>5_x-lead-author</code> branch</li>
<li>It works but is still very rough and we haven&rsquo;t thought out the whole lifecycle yet</li> <li>It works but is still very rough and we haven't thought out the whole lifecycle yet</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/07/lead-author-test.png" alt="Testing lead author in submission form"></p>
<p><img src="/cgspace-notes/2017/07/lead-author-test.png" alt="Testing lead author in submission form" /></p>
<ul> <ul>
<li>I assume that &ldquo;lead author&rdquo; would actually be the first question on the item submission form</li> <li>I assume that &ldquo;lead author&rdquo; would actually be the first question on the item submission form</li>
<li>We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for <code>dc.contributor.author</code> (which makes sense of course, but fuck, all the author problems aren&rsquo;t bad enough?!)</li> <li>We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for <code>dc.contributor.author</code> (which makes sense of course, but fuck, all the author problems aren't bad enough?!)</li>
<li>Also would need to edit XMLUI item displays to incorporate this into authors list</li> <li>Also would need to edit XMLUI item displays to incorporate this into authors list</li>
<li>And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of <code>dc.contributor.authors</code>&hellip; ugh</li> <li>And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of <code>dc.contributor.authors</code>&hellip; ugh</li>
<li>What if we modify the item submission form to use <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ItemtypeBasedMetadataCollection"><code>type-bind</code> fields to show/hide certain fields depending on the type</a>?</li> <li>What if we modify the item submission form to use <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ItemtypeBasedMetadataCollection"><code>type-bind</code> fields to show/hide certain fields depending on the type</a>?</li>
</ul> </ul>
<h2 id="20170705">2017-07-05</h2>
<h2 id="2017-07-05">2017-07-05</h2>
<ul> <ul>
<li>Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (<a href="https://github.com/ilri/DSpace/pull/330">#330</a>)</li> <li>Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (<a href="https://github.com/ilri/DSpace/pull/330">#330</a>)</li>
<li>Generate list of fields in the current CGSpace <code>cg</code> scheme so we can record them properly in the metadata registry:</li>
<li><p>Generate list of fields in the current CGSpace <code>cg</code> scheme so we can record them properly in the metadata registry:</p> </ul>
<pre><code>$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:&lt;/dc-type&gt;\n&lt;dc-type&gt;\n&lt;schema&gt;cg&lt;/schema&gt;:;s:([^ ]*) +\| (.*): &lt;\1&gt;\2&lt;/\1&gt;:;s:^$:&lt;/dc-type&gt;:;1s:&lt;/dc-type&gt;\n::' &gt; cg-types.xml <pre><code>$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:&lt;/dc-type&gt;\n&lt;dc-type&gt;\n&lt;schema&gt;cg&lt;/schema&gt;:;s:([^ ]*) +\| (.*): &lt;\1&gt;\2&lt;/\1&gt;:;s:^$:&lt;/dc-type&gt;:;1s:&lt;/dc-type&gt;\n::' &gt; cg-types.xml
</code></pre></li> </code></pre><ul>
<li>CGSpace was unavailable briefly, and I saw this error in the DSpace log file:</li>
<li><p>CGSpace was unavailable briefly, and I saw this error in the DSpace log file:</p> </ul>
<pre><code>2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - <pre><code>2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
</code></pre></li> </code></pre><ul>
<li>Looking at the <code>pg_stat_activity</code> table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense</li>
<li><p>Looking at the <code>pg_stat_activity</code> table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense</p></li> <li>Tsega restarted Tomcat and it's working now</li>
<li>Abenet said she was generating a report with Atmire's CUA module, so it could be due to that?</li>
<li><p>Tsega restarted Tomcat and it&rsquo;s working now</p></li> <li>Looking in the logs I see this random error again that I should report to DSpace:</li>
<li><p>Abenet said she was generating a report with Atmire&rsquo;s CUA module, so it could be due to that?</p></li>
<li><p>Looking in the logs I see this random error again that I should report to DSpace:</p>
<pre><code>2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
</code></pre></li>
<li><p>Seems to come from <code>dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java</code></p></li>
</ul> </ul>
<pre><code>2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
<h2 id="2017-07-06">2017-07-06</h2> </code></pre><ul>
<li>Seems to come from <code>dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java</code></li>
</ul>
<h2 id="20170706">2017-07-06</h2>
<ul> <ul>
<li>Sisay tried to help by making a <a href="https://github.com/ilri/DSpace/pull/331">pull request for the RTB flagships</a> but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested</li> <li>Sisay tried to help by making a <a href="https://github.com/ilri/DSpace/pull/331">pull request for the RTB flagships</a> but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested</li>
<li>Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field</li> <li>Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field</li>
</ul> </ul>
<h2 id="20170713">2017-07-13</h2>
<h2 id="2017-07-13">2017-07-13</h2>
<ul> <ul>
<li>Remove <code>UKaid</code> from the controlled vocabulary for <code>dc.description.sponsorship</code>, as <code>Department for International Development, United Kingdom</code> is the correct form and it is already present (<a href="https://github.com/ilri/DSpace/pull/334">#334</a>)</li> <li>Remove <code>UKaid</code> from the controlled vocabulary for <code>dc.description.sponsorship</code>, as <code>Department for International Development, United Kingdom</code> is the correct form and it is already present (<a href="https://github.com/ilri/DSpace/pull/334">#334</a>)</li>
</ul> </ul>
<h2 id="20170714">2017-07-14</h2>
<h2 id="2017-07-14">2017-07-14</h2>
<ul> <ul>
<li>Sisay sent me a patch to add &ldquo;Photo Report&rdquo; to <code>dc.type</code> so I&rsquo;ve added it to the <code>5_x-prod</code> branch</li> <li>Sisay sent me a patch to add &ldquo;Photo Report&rdquo; to <code>dc.type</code> so I've added it to the <code>5_x-prod</code> branch</li>
</ul> </ul>
<h2 id="20170717">2017-07-17</h2>
<h2 id="2017-07-17">2017-07-17</h2>
<ul> <ul>
<li>Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice</li> <li>Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice</li>
<li>It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up</li> <li>It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up</li>
<li>Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to <code>input-forms.xml</code> and the sponsors controlled vocabulary</li> <li>Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to <code>input-forms.xml</code> and the sponsors controlled vocabulary</li>
</ul> </ul>
<h2 id="20170720">2017-07-20</h2>
<h2 id="2017-07-20">2017-07-20</h2>
<ul> <ul>
<li>Skype chat with Addis team about the status of the CGIAR Library migration</li> <li>Skype chat with Addis team about the status of the CGIAR Library migration</li>
<li>Need to add the CGIAR System Organization subjects to Discovery Facets (test first)</li> <li>Need to add the CGIAR System Organization subjects to Discovery Facets (test first)</li>
<li>Tentative list of dates for the migration: <li>Tentative list of dates for the migration:
<ul> <ul>
<li>August 4: aim to finish data cleanup and then give Peter a list of authors</li> <li>August 4: aim to finish data cleanup and then give Peter a list of authors</li>
<li>August 18: ready to show System Office</li> <li>August 18: ready to show System Office</li>
<li>September 4: all feedback and decisions (including workflows) from System Office</li> <li>September 4: all feedback and decisions (including workflows) from System Office</li>
<li>September <sup>10</sup>&frasl;<sub>11</sub>: go live?</li> <li>September 10/11: go live?</li>
</ul></li> </ul>
</li>
<li>Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?</li> <li>Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?</li>
<li>Followup meeting on August <sup>8</sup>&frasl;<sub>9</sub>?</li> <li>Followup meeting on August 8/9?</li>
<li>Sent Abenet the 2415 records from CGIAR Library&rsquo;s Historical Archive (<sup>10947</sup>&frasl;<sub>1</sub>) after cleaning up the author authorities and HTML entities in <code>dc.contributor.author</code> and <code>dc.description.abstract</code> using OpenRefine: <li>Sent Abenet the 2415 records from CGIAR Library's Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in <code>dc.contributor.author</code> and <code>dc.description.abstract</code> using OpenRefine:
<ul> <ul>
<li>Authors: <code>value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,&quot;&quot;)</code></li> <li>Authors: <code>value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,&quot;&quot;)</code></li>
<li>Abstracts: <code>replace(value,/&lt;\/?\w+((\s+\w+(\s*=\s*(?:&quot;.*?&quot;|'.*?'|[^'&quot;&gt;\s]+))?)+\s*|\s*)\/?&gt;/,'')</code></li> <li>Abstracts: <code>replace(value,/&lt;\/?\w+((\s+\w+(\s*=\s*(?:&quot;.*?&quot;|'.*?'|[^'&quot;&gt;\s]+))?)+\s*|\s*)\/?&gt;/,'')</code></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2017-07-24">2017-07-24</h2> </ul>
<h2 id="20170724">2017-07-24</h2>
<ul> <ul>
<li><p>Move two top-level communities to be sub-communities of ILRI Projects</p> <li>Move two top-level communities to be sub-communities of ILRI Projects</li>
</ul>
<pre><code>$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=&quot;$community&quot;; done <pre><code>$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=&quot;$community&quot;; done
</code></pre></li> </code></pre><ul>
<li>Discuss CGIAR Library data cleanup with Sisay and Abenet</li>
<li><p>Discuss CGIAR Library data cleanup with Sisay and Abenet</p></li>
</ul> </ul>
<h2 id="20170727">2017-07-27</h2>
<h2 id="2017-07-27">2017-07-27</h2>
<ul> <ul>
<li>Help Sisay with some transforms to add descriptions to the <code>filename</code> column of some CIAT Presentations he&rsquo;s working on in OpenRefine</li> <li>Help Sisay with some transforms to add descriptions to the <code>filename</code> column of some CIAT Presentations he's working on in OpenRefine</li>
<li>Marianne emailed a few days ago to ask why &ldquo;Integrating Ecosystem Solutions&rdquo; was not in the list of WLE Phase I Research Themes on the input form</li> <li>Marianne emailed a few days ago to ask why &ldquo;Integrating Ecosystem Solutions&rdquo; was not in the list of WLE Phase I Research Themes on the input form</li>
<li>I told her that I only added the themes that I saw in the <a href="https://cgspace.cgiar.org/handle/10568/34508">WLE Phase I Research Themes</a> community</li> <li>I told her that I only added the themes that I saw in the <a href="https://cgspace.cgiar.org/handle/10568/34508">WLE Phase I Research Themes</a> community</li>
<li>Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn&rsquo;t understand what she was talking about, as all we did in our previous work was rename the old &ldquo;Research Themes&rdquo; subcommunity to &ldquo;WLE Phase I Research Themes&rdquo; and add a new subcommunity for &ldquo;WLE Phase II Research Themes&rdquo;.</li> <li>Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old &ldquo;Research Themes&rdquo; subcommunity to &ldquo;WLE Phase I Research Themes&rdquo; and add a new subcommunity for &ldquo;WLE Phase II Research Themes&rdquo;.</li>
<li>Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database</li> <li>Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database</li>
</ul> </ul>
<h2 id="20170728">2017-07-28</h2>
<h2 id="2017-07-28">2017-07-28</h2>
<ul> <ul>
<li>Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros</li> <li>Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros</li>
<li>I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our <code>input-forms.xml</code>: <a href="https://ccafs.cgiar.org/export/ccafsproject">https://ccafs.cgiar.org/export/ccafsproject</a></li> <li>I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our <code>input-forms.xml</code>: <a href="https://ccafs.cgiar.org/export/ccafsproject">https://ccafs.cgiar.org/export/ccafsproject</a></li>
</ul> </ul>
<h2 id="20170729">2017-07-29</h2>
<h2 id="2017-07-29">2017-07-29</h2>
<ul> <ul>
<li>Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community</li> <li>Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community</li>
</ul> </ul>
<h2 id="20170730">2017-07-30</h2>
<h2 id="2017-07-30">2017-07-30</h2>
<ul> <ul>
<li>Start working on CCAFS project tag cleanup</li> <li>Start working on CCAFS project tag cleanup</li>
<li>More questions about inconsistencies and spelling mistakes in their tags, so I&rsquo;ve sent some questions for followup</li> <li>More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup</li>
</ul> </ul>
<h2 id="20170731">2017-07-31</h2>
<h2 id="2017-07-31">2017-07-31</h2>
<ul> <ul>
<li><p>Looks like the final list of metadata corrections for CCAFS project tags will be:</p> <li>Looks like the final list of metadata corrections for CCAFS project tags will be:</li>
</ul>
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica'; <pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED'; update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA'; update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions'; delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
</code></pre></li> </code></pre><ul>
<li>Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list</li>
<li><p>Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list</p></li> <li>Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations</li>
<li>Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!</li>
<li><p>Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations</p></li> </ul>
<li><p>Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!</p>
<pre><code>$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l <pre><code>$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l
52 52
</code></pre></li> </code></pre><ul>
<li>From looking at the <code>dspace.log</code> I see they are all using the same session, which means our Crawler Session Manager Valve is working</li>
<li><p>From looking at the <code>dspace.log</code> I see they are all using the same session, which means our Crawler Session Manager Valve is working</p></li>
</ul> </ul>

View File

@ -8,20 +8,19 @@
<meta property="og:title" content="August, 2017" /> <meta property="og:title" content="August, 2017" />
<meta property="og:description" content="2017-08-01 <meta property="og:description" content="2017-08-01
Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google) I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session
This means our Tomcat Crawler Session Valve is working This means our Tomcat Crawler Session Valve is working
But many of the bots are browsing dynamic URLs like: But many of the bots are browsing dynamic URLs like:
/handle/10568/3353/discover /handle/10568/3353/discover
/handle/10568/16510/browse /handle/10568/16510/browse
The robots.txt only blocks the top-level /discover and /browse URLs&hellip; we will need to find a way to forbid them from accessing these! The robots.txt only blocks the top-level /discover and /browse URLs&hellip; we will need to find a way to forbid them from accessing these!
Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962 Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
It turns out that we&rsquo;re already adding the X-Robots-Tag &quot;none&quot; HTTP header, but this only forbids the search engine from indexing the page, not crawling it! It turns out that we&#39;re already adding the X-Robots-Tag &quot;none&quot; HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip; Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;
We might actually have to block these requests with HTTP 403 depending on the user agent We might actually have to block these requests with HTTP 403 depending on the user agent
Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415 Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
@ -38,20 +37,19 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<meta name="twitter:title" content="August, 2017"/> <meta name="twitter:title" content="August, 2017"/>
<meta name="twitter:description" content="2017-08-01 <meta name="twitter:description" content="2017-08-01
Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google) I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session The good thing is that, according to dspace.log.2017-08-01, they are all using the same Tomcat session
This means our Tomcat Crawler Session Valve is working This means our Tomcat Crawler Session Valve is working
But many of the bots are browsing dynamic URLs like: But many of the bots are browsing dynamic URLs like:
/handle/10568/3353/discover /handle/10568/3353/discover
/handle/10568/16510/browse /handle/10568/16510/browse
The robots.txt only blocks the top-level /discover and /browse URLs&hellip; we will need to find a way to forbid them from accessing these! The robots.txt only blocks the top-level /discover and /browse URLs&hellip; we will need to find a way to forbid them from accessing these!
Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962 Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
It turns out that we&rsquo;re already adding the X-Robots-Tag &quot;none&quot; HTTP header, but this only forbids the search engine from indexing the page, not crawling it! It turns out that we&#39;re already adding the X-Robots-Tag &quot;none&quot; HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip; Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;
We might actually have to block these requests with HTTP 403 depending on the user agent We might actually have to block these requests with HTTP 403 depending on the user agent
Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415 Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
@ -59,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -140,22 +138,21 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
</p> </p>
</header> </header>
<h2 id="2017-08-01">2017-08-01</h2> <h2 id="20170801">2017-08-01</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> <li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> <li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li> <li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like: <li>But many of the bots are browsing dynamic URLs like:
<ul> <ul>
<li>/handle/10568/3353/discover</li> <li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li> <li>/handle/10568/16510/browse</li>
</ul></li> </ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
@ -163,87 +160,67 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> <li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> <li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul> </ul>
<h2 id="20170802">2017-08-02</h2>
<h2 id="2017-08-02">2017-08-02</h2>
<ul> <ul>
<li>Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)</li> <li>Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)</li>
<li>I think Atmire&rsquo;s Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can&rsquo;t figure it out</li> <li>I think Atmire's Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can't figure it out</li>
<li>I had a look at the moduel configuration and couldn&rsquo;t figure out a way to do this, so I <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets">opened a ticket on the Atmire tracker</a></li> <li>I had a look at the moduel configuration and couldn't figure out a way to do this, so I <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets">opened a ticket on the Atmire tracker</a></li>
<li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=500">missing workflow statistics issue</a> a few weeks ago but I didn&rsquo;t see it for some reason</li> <li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=500">missing workflow statistics issue</a> a few weeks ago but I didn't see it for some reason</li>
<li>They said they added a publication and saw the workflow stat for the user, so I should try again and let them know</li> <li>They said they added a publication and saw the workflow stat for the user, so I should try again and let them know</li>
</ul> </ul>
<h2 id="20170805">2017-08-05</h2>
<h2 id="2017-08-05">2017-08-05</h2>
<ul> <ul>
<li>Usman from CIFOR emailed to ask about the status of our OAI tests for harvesting their DSpace repository</li> <li>Usman from CIFOR emailed to ask about the status of our OAI tests for harvesting their DSpace repository</li>
<li>I told him that the OAI appears to not be harvesting properly after the first sync, and that the control panel shows an &ldquo;Internal error&rdquo; for that collection:</li> <li>I told him that the OAI appears to not be harvesting properly after the first sync, and that the control panel shows an &ldquo;Internal error&rdquo; for that collection:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/08/cifor-oai-harvesting.png" alt="CIFOR OAI harvesting"></p>
<p><img src="/cgspace-notes/2017/08/cifor-oai-harvesting.png" alt="CIFOR OAI harvesting" /></p>
<ul> <ul>
<li>I don&rsquo;t see anything related in our logs, so I asked him to check for our server&rsquo;s IP in their logs</li> <li>I don't see anything related in our logs, so I asked him to check for our server's IP in their logs</li>
<li>Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn&rsquo;t reset the collection, just the harvester status!)</li> <li>Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn't reset the collection, just the harvester status!)</li>
</ul> </ul>
<h2 id="20170807">2017-08-07</h2>
<h2 id="2017-08-07">2017-08-07</h2>
<ul> <ul>
<li>Apply Abenet&rsquo;s corrections for the CGIAR Library&rsquo;s Consortium subcommunity (697 records)</li> <li>Apply Abenet's corrections for the CGIAR Library's Consortium subcommunity (697 records)</li>
<li>I had to fix a few small things, like moving the <code>dc.title</code> column away from the beginning of the row, delete blank spaces in the abstract in vim using <code>:g/^$/d</code>, add the <code>dc.subject[en_US]</code> column back, as she had deleted it and DSpace didn&rsquo;t detect the changes made there (we needed to blank the values instead)</li> <li>I had to fix a few small things, like moving the <code>dc.title</code> column away from the beginning of the row, delete blank spaces in the abstract in vim using <code>:g/^$/d</code>, add the <code>dc.subject[en_US]</code> column back, as she had deleted it and DSpace didn't detect the changes made there (we needed to blank the values instead)</li>
</ul> </ul>
<h2 id="20170808">2017-08-08</h2>
<h2 id="2017-08-08">2017-08-08</h2>
<ul> <ul>
<li>Apply Abenet&rsquo;s corrections for the CGIAR Library&rsquo;s historic archive subcommunity (2415 records)</li> <li>Apply Abenet's corrections for the CGIAR Library's historic archive subcommunity (2415 records)</li>
<li>I had to add the <code>dc.subject[en_US]</code> column back with blank values so that DSpace could detect the changes</li> <li>I had to add the <code>dc.subject[en_US]</code> column back with blank values so that DSpace could detect the changes</li>
<li>I applied the changes in 500 item batches</li> <li>I applied the changes in 500 item batches</li>
</ul> </ul>
<h2 id="20170809">2017-08-09</h2>
<h2 id="2017-08-09">2017-08-09</h2>
<ul> <ul>
<li>Run system updates on DSpace Test and reboot server</li> <li>Run system updates on DSpace Test and reboot server</li>
<li>Help ICARDA upgrade their MELSpace to DSpace 5.7 using the <a href="https://github.com/alanorth/docker-dspace">docker-dspace</a> container <li>Help ICARDA upgrade their MELSpace to DSpace 5.7 using the <a href="https://github.com/alanorth/docker-dspace">docker-dspace</a> container
<ul> <ul>
<li>We had to import the PostgreSQL dump to the PostgreSQL container using: <code>pg_restore -U postgres -d dspace blah.dump</code></li> <li>We had to import the PostgreSQL dump to the PostgreSQL container using: <code>pg_restore -U postgres -d dspace blah.dump</code></li>
<li>Otherwise, when using <code>-O</code> it messes up the permissions on the schema and DSpace can&rsquo;t read it</li> <li>Otherwise, when using <code>-O</code> it messes up the permissions on the schema and DSpace can't read it</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2017-08-10">2017-08-10</h2> </ul>
<h2 id="20170810">2017-08-10</h2>
<ul> <ul>
<li>Apply last updates to the CGIAR Library&rsquo;s Fund community (812 items)</li> <li>Apply last updates to the CGIAR Library's Fund community (812 items)</li>
<li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li> <li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li>
<li>Also I applied the HTML entities unescape transform on the abstract column in Open Refine</li> <li>Also I applied the HTML entities unescape transform on the abstract column in Open Refine</li>
<li>I need to get an author list from the database for only the CGIAR Library community to send to Peter</li> <li>I need to get an author list from the database for only the CGIAR Library community to send to Peter</li>
<li>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</li>
<li><p>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</p> </ul>
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) group by text_value order by count desc) to /tmp/cgiar-library-authors.csv with csv; <pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) group by text_value order by count desc) to /tmp/cgiar-library-authors.csv with csv;
</code></pre></li> </code></pre><ul>
<li>Meeting with Peter and CGSpace team
<li><p>Meeting with Peter and CGSpace team</p>
<ul> <ul>
<li>Alan to follow up with ICARDA about depositing in CGSpace, we want ICARD and Drylands legacy content but not duplicates</li> <li>Alan to follow up with ICARDA about depositing in CGSpace, we want ICARD and Drylands legacy content but not duplicates</li>
<li>Alan to follow up on dc.rights, where are we?</li> <li>Alan to follow up on dc.rights, where are we?</li>
<li>Alan to follow up with Atmire about a dedicated field for ORCIDs, based on the discussion in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li> <li>Alan to follow up with Atmire about a dedicated field for ORCIDs, based on the discussion in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li>
<li>Alan to ask about how to query external services like AGROVOC in the DSpace submission form</li> <li>Alan to ask about how to query external services like AGROVOC in the DSpace submission form</li>
</ul></li>
<li><p>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></p></li>
<li><p>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</p></li>
</ul> </ul>
</li>
<h2 id="2017-08-11">2017-08-11</h2> <li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></li>
<li>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</li>
</ul>
<h2 id="20170811">2017-08-11</h2>
<ul> <ul>
<li>CGSpace had load issues and was throwing errors related to PostgreSQL</li> <li>CGSpace had load issues and was throwing errors related to PostgreSQL</li>
<li>I told Tsega to reduce the max connections from 70 to 40 because actually each web application gets that limit and so for xmlui, oai, jspui, rest, etc it could be 70 x 4 = 280 connections depending on the load, and the PostgreSQL config itself is only 100!</li> <li>I told Tsega to reduce the max connections from 70 to 40 because actually each web application gets that limit and so for xmlui, oai, jspui, rest, etc it could be 70 x 4 = 280 connections depending on the load, and the PostgreSQL config itself is only 100!</li>
@ -252,64 +229,48 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>Also, I need to find out where the load is coming from (rest?) and possibly block bots from accessing dynamic pages like Browse and Discover instead of just sending an X-Robots-Tag HTTP header</li> <li>Also, I need to find out where the load is coming from (rest?) and possibly block bots from accessing dynamic pages like Browse and Discover instead of just sending an X-Robots-Tag HTTP header</li>
<li>I noticed that Google has bitstreams from the <code>rest</code> interface in the search index. I need to ask on the dspace-tech mailing list to see what other people are doing about this, and maybe start issuing an <code>X-Robots-Tag: none</code> there!</li> <li>I noticed that Google has bitstreams from the <code>rest</code> interface in the search index. I need to ask on the dspace-tech mailing list to see what other people are doing about this, and maybe start issuing an <code>X-Robots-Tag: none</code> there!</li>
</ul> </ul>
<h2 id="20170812">2017-08-12</h2>
<h2 id="2017-08-12">2017-08-12</h2>
<ul> <ul>
<li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li> <li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li>
<li>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it&hellip;</li>
<li><p>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it&hellip;</p>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 5
140 66.249.66.91
404 66.249.66.90
1479 50.116.102.77
9794 45.5.184.196
85736 70.32.83.92
</code></pre></li>
<li><p>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</p></li>
<li><p>I&rsquo;ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</p>
<pre><code># log oai requests
location /oai {
access_log /var/log/nginx/oai.log;
proxy_pass http://tomcat_http;
}
</code></pre></li>
</ul> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 5
<h2 id="2017-08-13">2017-08-13</h2> 140 66.249.66.91
404 66.249.66.90
1479 50.116.102.77
9794 45.5.184.196
85736 70.32.83.92
</code></pre><ul>
<li>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</li>
<li>I've enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</li>
</ul>
<pre><code> # log oai requests
location /oai {
access_log /var/log/nginx/oai.log;
proxy_pass http://tomcat_http;
}
</code></pre><h2 id="20170813">2017-08-13</h2>
<ul> <ul>
<li>Macaroni Bros say that CCAFS wants them to check once every hour for changes</li> <li>Macaroni Bros say that CCAFS wants them to check once every hour for changes</li>
<li>I told them to check every four or six hours</li> <li>I told them to check every four or six hours</li>
</ul> </ul>
<h2 id="20170814">2017-08-14</h2>
<h2 id="2017-08-14">2017-08-14</h2>
<ul> <ul>
<li><p>Run author corrections on CGIAR Library community from Peter</p> <li>Run author corrections on CGIAR Library community from Peter</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-523.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p fuuuu <pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-523.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p fuuuu
</code></pre></li> </code></pre><ul>
<li>There were only three deletions so I just did them manually:</li>
<li><p>There were only three deletions so I just did them manually:</p> </ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='C'; <pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='C';
DELETE 1 DELETE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD'; dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD';
</code></pre></li> </code></pre><ul>
<li>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</li>
<li><p>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</p></li> <li>Thinking about resource limits for PostgreSQL again after last week's CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></li>
<li>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</li>
<li><p>Thinking about resource limits for PostgreSQL again after last week&rsquo;s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></p></li> <li>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</li>
</ul>
<li><p>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</p></li>
<li><p>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</p>
<pre><code>$ grep -rsI SQLException dspace-jspui | wc -l <pre><code>$ grep -rsI SQLException dspace-jspui | wc -l
473 473
$ grep -rsI SQLException dspace-oai | wc -l $ grep -rsI SQLException dspace-oai | wc -l
@ -320,39 +281,26 @@ $ grep -rsI SQLException dspace-solr | wc -l
0 0
$ grep -rsI SQLException dspace-xmlui | wc -l $ grep -rsI SQLException dspace-xmlui | wc -l
866 866
</code></pre></li> </code></pre><ul>
<li>Of those five applications we're running, only <code>solr</code> appears not to use the database directly</li>
<li><p>Of those five applications we&rsquo;re running, only <code>solr</code> appears not to use the database directly</p></li> <li>And JSPUI is only used internally (so it doesn't really count), leaving us with OAI, REST, and XMLUI</li>
<li>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL's default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</li>
<li><p>And JSPUI is only used internally (so it doesn&rsquo;t really count), leaving us with OAI, REST, and XMLUI</p></li> <li>So we should adjust PostgreSQL's max connections to be DSpace's <code>db.maxconnections</code> * 3 + 3</li>
<li>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system's PostgreSQL limit</li>
<li><p>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL&rsquo;s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</p></li> <li>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></li>
<li>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace's <code>db.poolname</code> hasn't been used since around DSpace 1.7 (according to Chris Wilper's comments in the thread)</li>
<li><p>So we should adjust PostgreSQL&rsquo;s max connections to be DSpace&rsquo;s <code>db.maxconnections</code> * 3 + 3</p></li> <li>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</li>
<li>Looks like connections hover around 50:</li>
<li><p>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system&rsquo;s PostgreSQL limit</p></li>
<li><p>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></p></li>
<li><p>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace&rsquo;s <code>db.poolname</code> hasn&rsquo;t been used since around DSpace 1.7 (according to Chris Wilper&rsquo;s comments in the thread)</p></li>
<li><p>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</p></li>
<li><p>Looks like connections hover around 50:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/08/postgresql-connections-cgspace.png" alt="PostgreSQL connections 2017-08"></p>
<p><img src="/cgspace-notes/2017/08/postgresql-connections-cgspace.png" alt="PostgreSQL connections 2017-08" /></p>
<ul> <ul>
<li>Unfortunately I don&rsquo;t have the breakdown of which DSpace apps are making those connections (I&rsquo;ll assume XMLUI)</li> <li>Unfortunately I don't have the breakdown of which DSpace apps are making those connections (I'll assume XMLUI)</li>
<li>So I guess a limit of 30 (DSpace default) is too low, but 70 causes problems when the load increases and the system&rsquo;s PostgreSQL <code>max_connections</code> is too low</li> <li>So I guess a limit of 30 (DSpace default) is too low, but 70 causes problems when the load increases and the system's PostgreSQL <code>max_connections</code> is too low</li>
<li>For now I think maybe setting DSpace&rsquo;s <code>db.maxconnections</code> to 40 and adjusting the system&rsquo;s <code>max_connections</code> might be a good starting point: 40 * 3 + 3 = 123</li> <li>For now I think maybe setting DSpace's <code>db.maxconnections</code> to 40 and adjusting the system's <code>max_connections</code> might be a good starting point: 40 * 3 + 3 = 123</li>
<li>Apply 223 more author corrections from Peter on CGIAR Library</li> <li>Apply 223 more author corrections from Peter on CGIAR Library</li>
<li>Help Magdalena from CCAFS with some CUA statistics questions</li> <li>Help Magdalena from CCAFS with some CUA statistics questions</li>
</ul> </ul>
<h2 id="20170815">2017-08-15</h2>
<h2 id="2017-08-15">2017-08-15</h2>
<ul> <ul>
<li>Increase the nginx upload limit on CGSpace (linode18) so Sisay can upload 23 CIAT reports</li> <li>Increase the nginx upload limit on CGSpace (linode18) so Sisay can upload 23 CIAT reports</li>
<li>Do some last minute cleanups and de-duplications of the CGIAR Library data, as I need to send it to Peter this week</li> <li>Do some last minute cleanups and de-duplications of the CGIAR Library data, as I need to send it to Peter this week</li>
@ -360,81 +308,60 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
<li>Also, a few dozen <code>dc.description.abstract</code> fields still had various HTML tags and entities in them</li> <li>Also, a few dozen <code>dc.description.abstract</code> fields still had various HTML tags and entities in them</li>
<li>Also, a bunch of <code>dc.subject</code> fields that were not AGROVOC had not been moved properly to <code>cg.system.subject</code></li> <li>Also, a bunch of <code>dc.subject</code> fields that were not AGROVOC had not been moved properly to <code>cg.system.subject</code></li>
</ul> </ul>
<h2 id="20170816">2017-08-16</h2>
<h2 id="2017-08-16">2017-08-16</h2>
<ul> <ul>
<li><p>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</p> <li>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</li>
</ul>
<pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=254; <pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=254;
</code></pre></li> </code></pre><ul>
<li>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</li>
<li><p>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) <pre><code>dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')))
</code></pre></li> </code></pre><ul>
<li>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc&hellip;</li>
<li><p>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc&hellip;</p></li> <li>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don't use the Dublin Core one for some reason):</li>
</ul>
<li><p>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don&rsquo;t use the Dublin Core one for some reason):</p>
<pre><code>dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237; <pre><code>dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237;
UPDATE 15 UPDATE 15
</code></pre></li> </code></pre><ul>
<li>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</li>
<li><p>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_lang=NULL where resource_type_id = 2 and metadata_field_id = 25 and text_value like 'http://hdl.handle.net/10947/%'; <pre><code>dspace=# update metadatavalue set text_lang=NULL where resource_type_id = 2 and metadata_field_id = 25 and text_value like 'http://hdl.handle.net/10947/%';
UPDATE 4248 UPDATE 4248
</code></pre></li> </code></pre><ul>
<li>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</li>
<li><p>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</p> </ul>
<pre><code>dspace=# update metadatavalue set text_lang=NULL where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))); <pre><code>dspace=# update metadatavalue set text_lang=NULL where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')));
UPDATE 4899 UPDATE 4899
</code></pre></li> </code></pre><ul>
<li>Wow, I just wrote this baller regex facet to find duplicate authors:</li>
<li><p>Wow, I just wrote this baller regex facet to find duplicate authors:</p>
<pre><code>isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
</code></pre></li>
<li><p>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library&rsquo;s were</p></li>
<li><p>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn&rsquo;t detect any changes, so you have to edit them all manually via DSpace&rsquo;s &ldquo;Edit Item&rdquo;</p></li>
<li><p>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</p>
<pre><code>isNotNull(value.match(/(.+?)\|\|\1/))
</code></pre></li>
<li><p>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields&hellip;</p></li>
<li><p>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</p></li>
<li><p>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</p></li>
<li><p>Abenet was asking if there was some way to hide certain internal items from the &ldquo;ILRI Research Outputs&rdquo; RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</p></li>
<li><p>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</p></li>
<li><p>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</p></li>
</ul> </ul>
<pre><code>isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
<h2 id="2017-08-17">2017-08-17</h2> </code></pre><ul>
<li>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library's were</li>
<li>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn't detect any changes, so you have to edit them all manually via DSpace's &ldquo;Edit Item&rdquo;</li>
<li>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</li>
</ul>
<pre><code>isNotNull(value.match(/(.+?)\|\|\1/))
</code></pre><ul>
<li>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields&hellip;</li>
<li>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</li>
<li>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</li>
<li>Abenet was asking if there was some way to hide certain internal items from the &ldquo;ILRI Research Outputs&rdquo; RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</li>
<li>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</li>
<li>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</li>
</ul>
<h2 id="20170817">2017-08-17</h2>
<ul> <ul>
<li>Run Peter&rsquo;s edits to the CGIAR System Organization community on DSpace Test</li> <li>Run Peter's edits to the CGIAR System Organization community on DSpace Test</li>
<li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li> <li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li>
<li>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</li>
<li><p>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</p> </ul>
<pre><code>2017-08-17 07:55:31,396 ERROR net.sf.ehcache.store.DiskStore @ cocoon-ehcacheCache: Could not read disk store element for key PK_G-aspect-cocoon://DRI/12/handle/10568/65885?pipelinehash=823411183535858997_T-Navigation-3368194896954203241. Error was invalid stream header: 00000000 <pre><code>2017-08-17 07:55:31,396 ERROR net.sf.ehcache.store.DiskStore @ cocoon-ehcacheCache: Could not read disk store element for key PK_G-aspect-cocoon://DRI/12/handle/10568/65885?pipelinehash=823411183535858997_T-Navigation-3368194896954203241. Error was invalid stream header: 00000000
java.io.StreamCorruptedException: invalid stream header: 00000000 java.io.StreamCorruptedException: invalid stream header: 00000000
</code></pre></li> </code></pre><ul>
<li>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</li>
<li><p>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</p> </ul>
<pre><code># grep -c &quot;ERROR net.sf.ehcache.store.DiskStore&quot; dspace.log.2017-08-* <pre><code># grep -c &quot;ERROR net.sf.ehcache.store.DiskStore&quot; dspace.log.2017-08-*
dspace.log.2017-08-01:0 dspace.log.2017-08-01:0
dspace.log.2017-08-02:0 dspace.log.2017-08-02:0
@ -453,46 +380,37 @@ dspace.log.2017-08-14:2135
dspace.log.2017-08-15:1506 dspace.log.2017-08-15:1506
dspace.log.2017-08-16:1935 dspace.log.2017-08-16:1935
dspace.log.2017-08-17:584 dspace.log.2017-08-17:584
</code></pre></li> </code></pre><ul>
<li>There are none in 2017-07 either&hellip;</li>
<li><p>There are none in 2017-07 either&hellip;</p></li> <li>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</li>
<li>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</li>
<li><p>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</p></li> <li>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</li>
<li>I created four items, and only the two with public metadata showed up in the community's RSS feed:
<li><p>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</p></li>
<li><p>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</p></li>
<li><p>I created four items, and only the two with public metadata showed up in the community&rsquo;s RSS feed:</p>
<ul> <ul>
<li>Public metadata, public bitstream ✓</li> <li>Public metadata, public bitstream ✓</li>
<li>Public metadata, restricted bitstream ✓</li> <li>Public metadata, restricted bitstream ✓</li>
<li>Restricted metadata, restricted bitstream ✗</li> <li>Restricted metadata, restricted bitstream ✗</li>
<li>Private item ✗</li> <li>Private item ✗</li>
</ul></li>
<li><p>Peter responded and said that he doesn&rsquo;t want to limit items to be restricted just so we can change the RSS feeds</p></li>
</ul> </ul>
</li>
<h2 id="2017-08-18">2017-08-18</h2> <li>Peter responded and said that he doesn't want to limit items to be restricted just so we can change the RSS feeds</li>
</ul>
<h2 id="20170818">2017-08-18</h2>
<ul> <ul>
<li>Someone on the dspace-tech mailing list responded with some tips about using the authority framework to do external queries from the submission form</li> <li>Someone on the dspace-tech mailing list responded with some tips about using the authority framework to do external queries from the submission form</li>
<li>He linked to some examples from DSpace-CRIS that use this functionality: <a href="https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java">VIAFAuthority</a></li> <li>He linked to some examples from DSpace-CRIS that use this functionality: <a href="https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java">VIAFAuthority</a></li>
<li>I wired it up to the <code>dc.subject</code> field of the submission interface using the &ldquo;lookup&rdquo; type and it works!</li> <li>I wired it up to the <code>dc.subject</code> field of the submission interface using the &ldquo;lookup&rdquo; type and it works!</li>
<li>I think we can use this example to get a working AGROVOC query</li> <li>I think we can use this example to get a working AGROVOC query</li>
<li>More information about authority framework: <a href="https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values">https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values</a></li> <li>More information about authority framework: <a href="https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values">https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values</a></li>
<li>Wow, I'm playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</li>
<li><p>Wow, I&rsquo;m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</p> </ul>
<pre><code>$ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc <pre><code>$ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
sparql$ PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core#&gt; sparql$ PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core#&gt;
SELECT SELECT
?label ?label
WHERE { WHERE {
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } { ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
FILTER regex(str(?label), &quot;^fish&quot;, &quot;i&quot;) . FILTER regex(str(?label), &quot;^fish&quot;, &quot;i&quot;) .
} LIMIT 10; } LIMIT 10;
┌───────────────────────┐ ┌───────────────────────┐
@ -509,89 +427,67 @@ FILTER regex(str(?label), &quot;^fish&quot;, &quot;i&quot;) .
│ fishing times │ │ fishing times │
│ fish passes │ │ fish passes │
└───────────────────────┘ └───────────────────────┘
</code></pre></li> </code></pre><ul>
<li>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
<li><p>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></p></li> <li>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></li>
<li>The startup time went from ~80s to 40s!</li>
<li><p>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></p></li>
<li><p>The startup time went from ~80s to 40s!</p></li>
</ul> </ul>
<h2 id="20170819">2017-08-19</h2>
<h2 id="2017-08-19">2017-08-19</h2>
<ul> <ul>
<li>More examples of SPARQL queries: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li> <li>More examples of SPARQL queries: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
<li>Specifically the explanation of the <code>FILTER</code> regex</li> <li>Specifically the explanation of the <code>FILTER</code> regex</li>
<li>Might want to <code>SELECT DISTINCT</code> or increase the <code>LIMIT</code> to get terms like &ldquo;wheat&rdquo; and &ldquo;fish&rdquo; to be visible</li> <li>Might want to <code>SELECT DISTINCT</code> or increase the <code>LIMIT</code> to get terms like &ldquo;wheat&rdquo; and &ldquo;fish&rdquo; to be visible</li>
<li>Test queries online on the AGROVOC SPARQL portal: <a href="http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc">http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc</a></li> <li>Test queries online on the AGROVOC SPARQL portal: http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc</li>
</ul> </ul>
<h2 id="20170820">2017-08-20</h2>
<h2 id="2017-08-20">2017-08-20</h2>
<ul> <ul>
<li>Since I cleared the XMLUI cache on 2017-08-17 there haven&rsquo;t been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li> <li>Since I cleared the XMLUI cache on 2017-08-17 there haven't been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li>
<li>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</li>
<li><p>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</p>
<pre><code>dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z';
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
-------------------+---------+-------------------+----------------------+-----------+-------+-----------+------------
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
(5 rows)
</code></pre></li>
<li><p>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</p></li>
<li><p>These are their handles:</p>
<pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
handle
------------
10947/4658
10947/4659
10947/4660
10947/4661
10947/4664
(5 rows)
</code></pre></li>
</ul> </ul>
<pre><code>dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z';
<h2 id="2017-08-23">2017-08-23</h2> metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
-------------------+---------+-------------------+----------------------+-----------+-------+-----------+------------
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
(5 rows)
</code></pre><ul>
<li>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</li>
<li>These are their handles:</li>
</ul>
<pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
handle
------------
10947/4658
10947/4659
10947/4660
10947/4661
10947/4664
(5 rows)
</code></pre><h2 id="20170823">2017-08-23</h2>
<ul> <ul>
<li>Start testing the nginx configs for the CGIAR Library migration as well as start making a checklist</li> <li>Start testing the nginx configs for the CGIAR Library migration as well as start making a checklist</li>
</ul> </ul>
<h2 id="20170828">2017-08-28</h2>
<h2 id="2017-08-28">2017-08-28</h2>
<ul> <ul>
<li>Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account</li> <li>Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account</li>
<li>I told him I can chat in a few weeks when I&rsquo;m back</li> <li>I told him I can chat in a few weeks when I'm back</li>
</ul> </ul>
<h2 id="20170831">2017-08-31</h2>
<h2 id="2017-08-31">2017-08-31</h2>
<ul> <ul>
<li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li> <li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li>
<li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li> <li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li>
<li>I asked Sisay about this and hinted that he should go back and fix these things, but let&rsquo;s see what he says</li> <li>I asked Sisay about this and hinted that he should go back and fix these things, but let's see what he says</li>
<li>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</li>
<li><p>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</p> </ul>
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error <pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre></li> </code></pre><ul>
<li>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</li>
<li><p>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</p></li> <li>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</li>
<li>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system's PostgreSQL <code>max_connections</code>)</li>
<li><p>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</p></li>
<li><p>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system&rsquo;s PostgreSQL <code>max_connections</code>)</p></li>
</ul> </ul>

View File

@ -8,14 +8,11 @@
<meta property="og:title" content="September, 2017" /> <meta property="og:title" content="September, 2017" />
<meta property="og:description" content="2017-09-06 <meta property="og:description" content="2017-09-06
Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
2017-09-07 2017-09-07
Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group
Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-09/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-09/" />
@ -26,16 +23,13 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account
<meta name="twitter:title" content="September, 2017"/> <meta name="twitter:title" content="September, 2017"/>
<meta name="twitter:description" content="2017-09-06 <meta name="twitter:description" content="2017-09-06
Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
2017-09-07 2017-09-07
Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group
Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -116,49 +110,33 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account
</p> </p>
</header> </header>
<h2 id="2017-09-06">2017-09-06</h2> <h2 id="20170906">2017-09-06</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul> </ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul> <ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul> </ul>
<h2 id="20170910">2017-09-10</h2>
<h2 id="2017-09-10">2017-09-10</h2>
<ul> <ul>
<li><p>Delete 58 blank metadata values from the CGSpace database:</p> <li>Delete 58 blank metadata values from the CGSpace database:</li>
</ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value=''; <pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 58 DELETE 58
</code></pre></li> </code></pre><ul>
<li>I also ran it on DSpace Test because we'll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</li>
<li><p>I also ran it on DSpace Test because we&rsquo;ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</p></li> <li>Run system updates and restart DSpace Test</li>
<li>We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)</li>
<li><p>Run system updates and restart DSpace Test</p></li> <li>I still have the original data from the CGIAR Library so I've zipped it up and sent it off to linode18 for now</li>
<li>sha256sum of <code>original-cgiar-library-6.6GB.tar.gz</code> is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a</li>
<li><p>We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)</p></li> <li>Start doing a test run of the CGIAR Library migration locally</li>
<li>Notes and todo checklist here for now: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
<li><p>I still have the original data from the CGIAR Library so I&rsquo;ve zipped it up and sent it off to linode18 for now</p></li> <li>Create pull request for Phase I and II changes to CCAFS Project Tags: <a href="https://github.com/ilri/DSpace/pull/336">#336</a></li>
<li>We've been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized</li>
<li><p>sha256sum of <code>original-cgiar-library-6.6GB.tar.gz</code> is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a</p></li> <li>There will need to be some metadata updatesthough if I recall correctly it is only about seven recordsfor that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I've asked for more clarification from Lili just in case</li>
<li>Looking at the DSpace logs to see if we've had a change in the &ldquo;Cannot get a connection&rdquo; errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</li>
<li><p>Start doing a test run of the CGIAR Library migration locally</p></li> </ul>
<li><p>Notes and todo checklist here for now: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></p></li>
<li><p>Create pull request for Phase I and II changes to CCAFS Project Tags: <a href="https://github.com/ilri/DSpace/pull/336">#336</a></p></li>
<li><p>We&rsquo;ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized</p></li>
<li><p>There will need to be some metadata updatesthough if I recall correctly it is only about seven recordsfor that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I&rsquo;ve asked for more clarification from Lili just in case</p></li>
<li><p>Looking at the DSpace logs to see if we&rsquo;ve had a change in the &ldquo;Cannot get a connection&rdquo; errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</p>
<pre><code># grep -c &quot;Cannot get a connection, pool error Timeout waiting for idle object&quot; dspace.log.2017-09-* <pre><code># grep -c &quot;Cannot get a connection, pool error Timeout waiting for idle object&quot; dspace.log.2017-09-*
dspace.log.2017-09-01:0 dspace.log.2017-09-01:0
dspace.log.2017-09-02:0 dspace.log.2017-09-02:0
@ -170,108 +148,84 @@ dspace.log.2017-09-07:0
dspace.log.2017-09-08:10 dspace.log.2017-09-08:10
dspace.log.2017-09-09:0 dspace.log.2017-09-09:0
dspace.log.2017-09-10:0 dspace.log.2017-09-10:0
</code></pre></li> </code></pre><ul>
<li>Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I'm sure that helped</li>
<li><p>Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I&rsquo;m sure that helped</p></li> <li>There are still some errors, though, so maybe I should bump the connection limit up a bit</li>
<li>I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we're currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system's PostgreSQL <code>max_connections</code> (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)</li>
<li><p>There are still some errors, though, so maybe I should bump the connection limit up a bit</p></li> <li>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</li>
<li>I'm expecting to see 0 connection errors for the next few months</li>
<li><p>I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we&rsquo;re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system&rsquo;s PostgreSQL <code>max_connections</code> (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)</p></li>
<li><p>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</p></li>
<li><p>I&rsquo;m expecting to see 0 connection errors for the next few months</p></li>
</ul> </ul>
<h2 id="20170911">2017-09-11</h2>
<h2 id="2017-09-11">2017-09-11</h2>
<ul> <ul>
<li>Lots of work testing the CGIAR Library migration</li> <li>Lots of work testing the CGIAR Library migration</li>
<li>Many technical notes and TODOs here: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li> <li>Many technical notes and TODOs here: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
</ul> </ul>
<h2 id="20170912">2017-09-12</h2>
<h2 id="2017-09-12">2017-09-12</h2>
<ul> <ul>
<li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn&rsquo;t seem to help actually</li> <li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn't seem to help actually</li>
<li>The import process takes the same amount of time with and without the caching</li> <li>The import process takes the same amount of time with and without the caching</li>
<li>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</li>
<li><p>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</p> </ul>
<pre><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420' <pre><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
</code></pre></li> </code></pre><ul>
<li>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></li>
<li><p>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></p></li> <li>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</li>
<li>I sent a message to the mailing list to see if anyone knows more about this</li>
<li><p>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</p></li> <li>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</li>
</ul>
<li><p>I sent a message to the mailing list to see if anyone knows more about this</p></li>
<li><p>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</p>
<pre><code>09:39:36.008956 IP 192.168.8.124.50515 &gt; 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&amp;pageID=update.properties&amp;id=2130706433&amp;os-name=Mac+OS+X&amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;jvm-version=1.8.0_144&amp;platform=x86_64&amp;tc-version=UNKNOWN&amp;tc-product=Ehcache+Core+1.7.2&amp;source=Ehcache+Core&amp;uptime-secs=0&amp;patch=UNKNOWN HTTP/1.1 <pre><code>09:39:36.008956 IP 192.168.8.124.50515 &gt; 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&amp;pageID=update.properties&amp;id=2130706433&amp;os-name=Mac+OS+X&amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;jvm-version=1.8.0_144&amp;platform=x86_64&amp;tc-version=UNKNOWN&amp;tc-product=Ehcache+Core+1.7.2&amp;source=Ehcache+Core&amp;uptime-secs=0&amp;patch=UNKNOWN HTTP/1.1
</code></pre></li> </code></pre><ul>
<li>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></li>
<li><p>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></p></li> <li>But we can disable it by adding an <code>updateCheck=&quot;false&quot;</code> attribute to the main <code>&lt;ehcache &gt;</code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></li>
<li>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</li>
<li><p>But we can disable it by adding an <code>updateCheck=&quot;false&quot;</code> attribute to the main <code>&lt;ehcache &gt;</code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></p></li> <li>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
<li><p>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</p></li>
<li><p>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace</p>
<ul> <ul>
<li>First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name</li> <li>First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name</li>
<li>The logic is that searching by name actually isn&rsquo;t very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names</li> <li>The logic is that searching by name actually isn't very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names</li>
<li>Atmire&rsquo;s proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)</li> <li>Atmire's proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)</li>
<li>Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field</li> <li>Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field</li>
<li>Ideally there could also be a user interface for cleanup and merging of authorities</li> <li>Ideally there could also be a user interface for cleanup and merging of authorities</li>
<li>He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release</li> <li>He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release</li>
<li>As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us</li> <li>As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2017-09-13">2017-09-13</h2> </ul>
<h2 id="20170913">2017-09-13</h2>
<ul> <ul>
<li>Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours</li> <li>Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours</li>
<li>I wonder what was going on, and looking into the nginx logs I think maybe it&rsquo;s OAI&hellip;</li> <li>I wonder what was going on, and looking into the nginx logs I think maybe it's OAI&hellip;</li>
<li>Here is yesterday's top ten IP addresses making requests to <code>/oai</code>:</li>
<li><p>Here is yesterday&rsquo;s top ten IP addresses making requests to <code>/oai</code>:</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10 <pre><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
1 213.136.89.78 1 213.136.89.78
1 66.249.66.90 1 66.249.66.90
1 66.249.66.92 1 66.249.66.92
3 68.180.229.31 3 68.180.229.31
4 35.187.22.255 4 35.187.22.255
13745 54.70.175.86 13745 54.70.175.86
15814 34.211.17.113 15814 34.211.17.113
15825 35.161.215.53 15825 35.161.215.53
16704 54.70.51.7 16704 54.70.51.7
</code></pre></li> </code></pre><ul>
<li>Compared to the previous day's logs it looks VERY high:</li>
<li><p>Compared to the previous day&rsquo;s logs it looks VERY high:</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10 <pre><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
1 207.46.13.39 1 207.46.13.39
1 66.249.66.93 1 66.249.66.93
2 66.249.66.91 2 66.249.66.91
4 216.244.66.194 4 216.244.66.194
14 66.249.66.90 14 66.249.66.90
</code></pre></li> </code></pre><ul>
<li>The user agents for those top IPs are:
<li><p>The user agents for those top IPs are:</p>
<ul> <ul>
<li>54.70.175.86: API scraper</li> <li>54.70.175.86: API scraper</li>
<li>34.211.17.113: API scraper</li> <li>34.211.17.113: API scraper</li>
<li>35.161.215.53: API scraper</li> <li>35.161.215.53: API scraper</li>
<li>54.70.51.7: API scraper</li> <li>54.70.51.7: API scraper</li>
</ul></li> </ul>
</li>
<li><p>And this user agent has never been seen before today (or at least recently!):</p> <li>And this user agent has never been seen before today (or at least recently!):</li>
</ul>
<pre><code># grep -c &quot;API scraper&quot; /var/log/nginx/oai.log <pre><code># grep -c &quot;API scraper&quot; /var/log/nginx/oai.log
62088 62088
# zgrep -c &quot;API scraper&quot; /var/log/nginx/oai.log.*.gz # zgrep -c &quot;API scraper&quot; /var/log/nginx/oai.log.*.gz
@ -304,214 +258,179 @@ dspace.log.2017-09-10:0
/var/log/nginx/oai.log.7.gz:0 /var/log/nginx/oai.log.7.gz:0
/var/log/nginx/oai.log.8.gz:0 /var/log/nginx/oai.log.8.gz:0
/var/log/nginx/oai.log.9.gz:0 /var/log/nginx/oai.log.9.gz:0
</code></pre></li> </code></pre><ul>
<li>Some of these heavy users are also using XMLUI, and their user agent isn't matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</li>
<li><p>Some of these heavy users are also using XMLUI, and their user agent isn&rsquo;t matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</p></li> <li>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</li>
</ul>
<li><p>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</p>
<pre><code># grep -a -E &quot;(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)&quot; /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l <pre><code># grep -a -E &quot;(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)&quot; /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
15924 15924
</code></pre></li> </code></pre><ul>
<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
<li><p>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</p></li> <li>A search for &ldquo;API scraper&rdquo; user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></li>
<li>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</li>
<li><p>A search for &ldquo;API scraper&rdquo; user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></p></li> </ul>
<li><p>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</p>
<pre><code>WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address <pre><code>WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
</code></pre></li> </code></pre><ul>
<li>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</li>
<li><p>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</p></li> <li>It appears they want to delete a lot of metadata, which I'm not sure they realize the implications of:</li>
</ul>
<li><p>It appears they want to delete a lot of metadata, which I&rsquo;m not sure they realize the implications of:</p>
<pre><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value; <pre><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
text_value | count text_value | count
--------------------------+------- --------------------------+-------
FP4_ClimateModels | 6 FP4_ClimateModels | 6
FP1_CSAEvidence | 7 FP1_CSAEvidence | 7
SEA_UpscalingInnovation | 7 SEA_UpscalingInnovation | 7
FP4_Baseline | 69 FP4_Baseline | 69
WA_Partnership | 1 WA_Partnership | 1
WA_SciencePolicyExchange | 6 WA_SciencePolicyExchange | 6
SA_GHGMeasurement | 2 SA_GHGMeasurement | 2
SA_CSV | 7 SA_CSV | 7
EA_PAR | 18 EA_PAR | 18
FP4_Livestock | 7 FP4_Livestock | 7
FP4_GenderPolicy | 4 FP4_GenderPolicy | 4
FP2_CRMWestAfrica | 12 FP2_CRMWestAfrica | 12
FP4_ClimateData | 24 FP4_ClimateData | 24
FP4_CCPAG | 2 FP4_CCPAG | 2
SEA_mitigationSAMPLES | 2 SEA_mitigationSAMPLES | 2
SA_Biodiversity | 1 SA_Biodiversity | 1
FP4_PolicyEngagement | 20 FP4_PolicyEngagement | 20
FP3_Gender | 9 FP3_Gender | 9
FP4_GenderToolbox | 3 FP4_GenderToolbox | 3
(19 rows) (19 rows)
</code></pre></li> </code></pre><ul>
<li>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</li>
<li><p>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</p></li> <li>She responded yes, so I'll at least need to do these deletes in PostgreSQL:</li>
</ul>
<li><p>She responded yes, so I&rsquo;ll at least need to do these deletes in PostgreSQL:</p>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII'); <pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
DELETE 207 DELETE 207
</code></pre></li> </code></pre><ul>
<li>When we discussed this in late July there were some other renames they had requested, but I don't see them in the current spreadsheet so I will have to follow that up</li>
<li><p>When we discussed this in late July there were some other renames they had requested, but I don&rsquo;t see them in the current spreadsheet so I will have to follow that up</p></li> <li>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</li>
<li>The final list of corrections and deletes should therefore be:</li>
<li><p>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</p></li> </ul>
<li><p>The final list of corrections and deletes should therefore be:</p>
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica'; <pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED'; update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA'; update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions'; delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII'); delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
</code></pre></li> </code></pre><ul>
<li>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</li>
<li><p>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</p></li> <li>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></li>
<li>I commented there suggesting that we disable it globally</li>
<li><p>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></p></li> <li>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</li>
<li>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week's migration</li>
<li><p>I commented there suggesting that we disable it globally</p></li> <li>I emailed the Handle administrators (<a href="mailto:hdladmin@cnri.reston.va.us">hdladmin@cnri.reston.va.us</a>) to ask them what the process for changing their prefix to be resolved by our resolver</li>
<li>They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new <code>sitebndl.zip</code></li>
<li><p>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</p></li> <li>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</li>
<li>Here are all my distinct authority combinations in the database before:</li>
<li><p>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week&rsquo;s migration</p></li>
<li><p>I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver</p></li>
<li><p>They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new <code>sitebndl.zip</code></p></li>
<li><p>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</p></li>
<li><p>Here are all my distinct authority combinations in the database before:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(8 rows)
</code></pre></li>
<li><p>And then after adding a new item and selecting an existing &ldquo;Orth, Alan&rdquo; with an ORCID in the author lookup:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(9 rows)
</code></pre></li>
<li><p>It created a new authority&hellip; let&rsquo;s try to add another item and select the same existing author and see what happens in the database:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(9 rows)
</code></pre></li>
<li><p>No new one&hellip; so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(10 rows)
</code></pre></li>
<li><p>Shit, it created another authority! Let&rsquo;s try it again!</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(11 rows)
</code></pre></li>
<li><p>It added <em>another</em> authority&hellip; surely this is not the desired behavior, or maybe we are not using this as intented?</p></li>
</ul> </ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
<h2 id="2017-09-14">2017-09-14</h2> text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(8 rows)
</code></pre><ul>
<li>And then after adding a new item and selecting an existing &ldquo;Orth, Alan&rdquo; with an ORCID in the author lookup:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(9 rows)
</code></pre><ul>
<li>It created a new authority&hellip; let's try to add another item and select the same existing author and see what happens in the database:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(9 rows)
</code></pre><ul>
<li>No new one&hellip; so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(10 rows)
</code></pre><ul>
<li>Shit, it created another authority! Let's try it again!</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
(11 rows)
</code></pre><ul>
<li>It added <em>another</em> authority&hellip; surely this is not the desired behavior, or maybe we are not using this as intented?</li>
</ul>
<h2 id="20170914">2017-09-14</h2>
<ul> <ul>
<li>Communicate with Handle.net admins to try to get some guidance about the 10947 prefix</li> <li>Communicate with Handle.net admins to try to get some guidance about the 10947 prefix</li>
<li>Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new <code>sitebndl.zip</code> file from their server and send it to Handle.net</li> <li>Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new <code>sitebndl.zip</code> file from their server and send it to Handle.net</li>
<li>Also, Handle.net says their prefix is up for annual renewal next month so we might want to just pay for it and take it over</li> <li>Also, Handle.net says their prefix is up for annual renewal next month so we might want to just pay for it and take it over</li>
<li>CGSpace was very slow and Uptime Robot even said it was down at one time</li> <li>CGSpace was very slow and Uptime Robot even said it was down at one time</li>
<li>I didn&rsquo;t see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it&rsquo;s just normal growing pains</li> <li>I didn't see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it's just normal growing pains</li>
<li>Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M</li> <li>Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M</li>
</ul> </ul>
<h2 id="20170915">2017-09-15</h2>
<h2 id="2017-09-15">2017-09-15</h2>
<ul> <ul>
<li><p>Apply CCAFS project tag corrections on CGSpace:</p> <li>Apply CCAFS project tag corrections on CGSpace:</li>
</ul>
<pre><code>dspace=# \i /tmp/ccafs-projects.sql <pre><code>dspace=# \i /tmp/ccafs-projects.sql
DELETE 5 DELETE 5
UPDATE 4 UPDATE 4
UPDATE 1 UPDATE 1
DELETE 1 DELETE 1
DELETE 207 DELETE 207
</code></pre></li> </code></pre><h2 id="20170917">2017-09-17</h2>
</ul>
<h2 id="2017-09-17">2017-09-17</h2>
<ul> <ul>
<li>Create pull request for CGSpace to be able to resolve multiple handles (<a href="https://github.com/ilri/DSpace/pull/339">#339</a>)</li> <li>Create pull request for CGSpace to be able to resolve multiple handles (<a href="https://github.com/ilri/DSpace/pull/339">#339</a>)</li>
<li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li> <li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li>
<li>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</li>
<li><p>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</p> </ul>
<pre><code>&quot;server_admins&quot; = ( <pre><code>&quot;server_admins&quot; = (
&quot;300:0.NA/10568&quot; &quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot; &quot;300:0.NA/10947&quot;
@ -526,162 +445,121 @@ DELETE 207
&quot;300:0.NA/10568&quot; &quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot; &quot;300:0.NA/10947&quot;
) )
</code></pre></li> </code></pre><ul>
<li>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</li>
<li><p>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</p></li> <li>The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568</li>
<li>I ended up having to read the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ForceReplaceMode">AIP Backup and Restore</a> closely a few times and then explicitly preserve handles and ignore parents:</li>
<li><p>The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568</p></li>
<li><p>I ended up having to read the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ForceReplaceMode">AIP Backup and Restore</a> closely a few times and then explicitly preserve handles and ignore parents:</p>
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/87738 $item; done
</code></pre></li>
<li><p>Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!</p></li>
<li><p>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</p></li>
<li><p>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</p></li>
</ul> </ul>
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/87738 $item; done
<h2 id="2017-09-18">2017-09-18</h2> </code></pre><ul>
<li>Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!</li>
<li>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</li>
<li>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</li>
</ul>
<h2 id="20170918">2017-09-18</h2>
<ul> <ul>
<li>I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better</li> <li>I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better</li>
<li>One item for comparison:</li> <li>One item for comparison:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/09/10947-2919-before.jpg" alt="With original DSpace 1.7 thumbnail"></p>
<p><img src="/cgspace-notes/2017/09/10947-2919-before.jpg" alt="With original DSpace 1.7 thumbnail" /></p> <p><img src="/cgspace-notes/2017/09/10947-2919-after.jpg" alt="After DSpace 5.5"></p>
<p><img src="/cgspace-notes/2017/09/10947-2919-after.jpg" alt="After DSpace 5.5" /></p>
<ul> <ul>
<li>Moved the CGIAR Library Migration notes to a page<a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a>as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li> <li>Moved the CGIAR Library Migration notes to a page<a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a>as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li>
</ul> </ul>
<h2 id="20170919">2017-09-19</h2>
<h2 id="2017-09-19">2017-09-19</h2>
<ul> <ul>
<li><p>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</p> <li>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</li>
</ul>
<pre><code>2017-09-19 00:00:14,953 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607 <pre><code>2017-09-19 00:00:14,953 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
... ...
2017-09-19 00:04:18,017 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753 2017-09-19 00:04:18,017 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
</code></pre></li> </code></pre><ul>
<li>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</li>
<li><p>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</p></li> <li>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</li>
<li>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article</a></li>
<li><p>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</p></li> <li>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</li>
<li>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</li>
<li><p>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article</a></p></li> <li>I told him to register first, as he's a CGIAR user and needs an account to be created before I can add him to the groups</li>
<li><p>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</p></li>
<li><p>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</p></li>
<li><p>I told him to register first, as he&rsquo;s a CGIAR user and needs an account to be created before I can add him to the groups</p></li>
</ul> </ul>
<h2 id="20170920">2017-09-20</h2>
<h2 id="2017-09-20">2017-09-20</h2>
<ul> <ul>
<li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li> <li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li>
<li>Force thumbnail regeneration for the CGIAR System Organization's Historic Archive community (2000 items):</li>
<li><p>Force thumbnail regeneration for the CGIAR System Organization&rsquo;s Historic Archive community (2000 items):</p>
<pre><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p &quot;ImageMagick PDF Thumbnail&quot;
</code></pre></li>
<li><p>I&rsquo;m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</p></li>
</ul> </ul>
<pre><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p &quot;ImageMagick PDF Thumbnail&quot;
<h2 id="2017-09-21">2017-09-21</h2> </code></pre><ul>
<li>I'm still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</li>
</ul>
<h2 id="20170921">2017-09-21</h2>
<ul> <ul>
<li>Switch to OpenJDK 8 from Oracle JDK on DSpace Test</li> <li>Switch to OpenJDK 8 from Oracle JDK on DSpace Test</li>
<li>I want to test this for awhile to see if we can start using it instead</li> <li>I want to test this for awhile to see if we can start using it instead</li>
<li>I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions</li> <li>I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions</li>
</ul> </ul>
<h2 id="20170922">2017-09-22</h2>
<h2 id="2017-09-22">2017-09-22</h2>
<ul> <ul>
<li>Experimenting with setting up a global JNDI database resource that can be pooled among all the DSpace webapps (reference the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting</a> comments)</li> <li>Experimenting with setting up a global JNDI database resource that can be pooled among all the DSpace webapps (reference the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting</a> comments)</li>
<li>See: <a href="https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java">https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java</a></li> <li>See: <a href="https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java">https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java</a></li>
<li>See: <a href="http://memorynotfound.com/configure-jndi-datasource-tomcat/">http://memorynotfound.com/configure-jndi-datasource-tomcat/</a></li> <li>See: <a href="http://memorynotfound.com/configure-jndi-datasource-tomcat/">http://memorynotfound.com/configure-jndi-datasource-tomcat/</a></li>
</ul> </ul>
<h2 id="20170924">2017-09-24</h2>
<h2 id="2017-09-24">2017-09-24</h2>
<ul> <ul>
<li>Start investigating other platforms for CGSpace due to linear instance pricing on Linode</li> <li>Start investigating other platforms for CGSpace due to linear instance pricing on Linode</li>
<li>We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs</li> <li>We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs</li>
<li>First, here&rsquo;s the last week of memory usage on CGSpace and DSpace Test:</li> <li>First, here's the last week of memory usage on CGSpace and DSpace Test:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/09/cgspace-memory-week.png" alt="CGSpace memory week">
<p><img src="/cgspace-notes/2017/09/cgspace-memory-week.png" alt="CGSpace memory week" /> <img src="/cgspace-notes/2017/09/dspace-test-memory-week.png" alt="DSpace Test memory week"></p>
<img src="/cgspace-notes/2017/09/dspace-test-memory-week.png" alt="DSpace Test memory week" /></p>
<ul> <ul>
<li>8GB of RAM seems to be good for DSpace Test for now, with Tomcat&rsquo;s JVM heap taking 3GB, caches and buffers taking 34GB, and then ~1GB unused</li> <li>8GB of RAM seems to be good for DSpace Test for now, with Tomcat's JVM heap taking 3GB, caches and buffers taking 34GB, and then ~1GB unused</li>
<li>24GB of RAM is <em>way</em> too much for CGSpace, with Tomcat&rsquo;s JVM heap taking 5.5GB and caches and buffers happily using 14GB or so</li> <li>24GB of RAM is <em>way</em> too much for CGSpace, with Tomcat's JVM heap taking 5.5GB and caches and buffers happily using 14GB or so</li>
<li>As far as disk space, the CGSpace assetstore currently uses 51GB and Solr cores use 86GB (mostly in the statistics core)</li> <li>As far as disk space, the CGSpace assetstore currently uses 51GB and Solr cores use 86GB (mostly in the statistics core)</li>
<li>DSpace Test currently doesn&rsquo;t even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space</li> <li>DSpace Test currently doesn't even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space</li>
<li>I&rsquo;ve heard Google Cloud is nice (cheap and performant) but it&rsquo;s definitely more complicated than Linode and instances aren&rsquo;t <em>that</em> much cheaper to make it worth it</li> <li>I've heard Google Cloud is nice (cheap and performant) but it's definitely more complicated than Linode and instances aren't <em>that</em> much cheaper to make it worth it</li>
<li>Here are some theoretical instances on Google Cloud: <li>Here are some theoretical instances on Google Cloud:
<ul> <ul>
<li>DSpace Test, <code>n1-standard-2</code> with 2 vCPUs, 7.5GB RAM, 300GB persistent SSD: $99/month</li> <li>DSpace Test, <code>n1-standard-2 </code> with 2 vCPUs, 7.5GB RAM, 300GB persistent SSD: $99/month</li>
<li>CGSpace, <code>n1-standard-4</code> with 4 vCPUs, 15GB RAM, 300GB persistent SSD: $148/month</li> <li>CGSpace, <code>n1-standard-4 </code> with 4 vCPUs, 15GB RAM, 300GB persistent SSD: $148/month</li>
</ul></li> </ul>
<li>Looking at <a href="https://www.linode.com/pricing#all">Linode&rsquo;s instance pricing</a>, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add <a href="https://www.linode.com/docs/platform/how-to-use-block-storage-with-your-linode">block storage</a> of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)</li> </li>
<li>Looking at <a href="https://www.linode.com/pricing#all">Linode's instance pricing</a>, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add <a href="https://www.linode.com/docs/platform/how-to-use-block-storage-with-your-linode">block storage</a> of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)</li>
<li>For CGSpace we could use the cheaper 12GB instance for $80 and then add block storage of 500GB for $50</li> <li>For CGSpace we could use the cheaper 12GB instance for $80 and then add block storage of 500GB for $50</li>
<li>I&rsquo;ve sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta</li> <li>I've sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta</li>
<li>Create pull request for adding ISI Journal to search filters (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li> <li>Create pull request for adding ISI Journal to search filters (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
<li>Peter asked if we could map all the items of type <code>Journal Article</code> in <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI Archive</a> to <a href="https://cgspace.cgiar.org/handle/10568/3">ILRI articles in journals and newsletters</a></li> <li>Peter asked if we could map all the items of type <code>Journal Article</code> in <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI Archive</a> to <a href="https://cgspace.cgiar.org/handle/10568/3">ILRI articles in journals and newsletters</a></li>
<li>It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not</li> <li>It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not</li>
<li>I&rsquo;ve asked Peter if he knows what&rsquo;s going on (or who mapped them)</li> <li>I've asked Peter if he knows what's going on (or who mapped them)</li>
<li>Turns out he had already mapped some, but requested that I finish the rest</li> <li>Turns out he had already mapped some, but requested that I finish the rest</li>
<li>With this GREL in OpenRefine I can find items that are mapped, ie they have <code>10568/3||</code> or <code>10568/3$</code> in their <code>collection</code> field:</li>
<li><p>With this GREL in OpenRefine I can find items that are mapped, ie they have <code>10568/3||</code> or <code>10568/3$</code> in their <code>collection</code> field:</p>
<pre><code>isNotNull(value.match(/.+?10568\/3(\|\|.+|$)/))
</code></pre></li>
<li><p>Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections</p></li>
<li><p>I ended up having to kill the import and wait until he was done</p></li>
<li><p>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</p></li>
</ul> </ul>
<pre><code>isNotNull(value.match(/.+?10568\/3(\|\|.+|$)/))
<h2 id="2017-09-25">2017-09-25</h2> </code></pre><ul>
<li>Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections</li>
<li>I ended up having to kill the import and wait until he was done</li>
<li>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</li>
</ul>
<h2 id="20170925">2017-09-25</h2>
<ul> <ul>
<li>Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode</li> <li>Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode</li>
<li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li> <li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li>
<li>Peter wants me to clean up the text values for Delia Grace's metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</li>
<li><p>Peter wants me to clean up the text values for Delia Grace&rsquo;s metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</p> </ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; <pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
text_value | authority | confidence text_value | authority | confidence
--------------+--------------------------------------+------------ --------------+--------------------------------------+------------
Grace, Delia | | 600 Grace, Delia | | 600
Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | 600 Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | 600
Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | -1 Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | -1
Grace, D. | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc | -1 Grace, D. | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc | -1
</code></pre></li> </code></pre><ul>
<li>Strangely, none of her authority entries have ORCIDs anymore&hellip;</li>
<li><p>Strangely, none of her authority entries have ORCIDs anymore&hellip;</p></li> <li>I'll just fix the text values and forget about it for now:</li>
</ul>
<li><p>I&rsquo;ll just fix the text values and forget about it for now:</p>
<pre><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; <pre><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 610 UPDATE 610
</code></pre></li> </code></pre><ul>
<li>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</li>
<li><p>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
@ -693,76 +571,66 @@ Retrieving all data
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
Exception: null Exception: null
java.lang.NullPointerException java.lang.NullPointerException
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
real 6m6.447s real 6m6.447s
user 1m34.010s user 1m34.010s
sys 0m12.113s sys 0m12.113s
</code></pre></li> </code></pre><ul>
<li>The <code>index-authority</code> script always seems to fail, I think it's the same old bug</li>
<li><p>The <code>index-authority</code> script always seems to fail, I think it&rsquo;s the same old bug</p></li> <li>Something interesting for my notes about JNDI database pool—since I couldn't determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:</li>
</ul>
<li><p>Something interesting for my notes about JNDI database pool—since I couldn&rsquo;t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:</p>
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal <pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal
... ...
INFO org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal INFO org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal
INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
</code></pre></li> </code></pre><ul>
<li>So it's good to know that <em>something</em> gets printed when it fails because I didn't see <em>any</em> mention of JNDI before when I was testing!</li>
<li><p>So it&rsquo;s good to know that <em>something</em> gets printed when it fails because I didn&rsquo;t see <em>any</em> mention of JNDI before when I was testing!</p></li>
</ul> </ul>
<h2 id="20170926">2017-09-26</h2>
<h2 id="2017-09-26">2017-09-26</h2>
<ul> <ul>
<li>Adam Hunt from WLE finally registered so I added him to the editor and approver groups</li> <li>Adam Hunt from WLE finally registered so I added him to the editor and approver groups</li>
<li>Then I noticed that Sisay never removed Marianne&rsquo;s user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps</li> <li>Then I noticed that Sisay never removed Marianne's user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps</li>
<li>For what it&rsquo;s worth, I had asked him to remove them on 2017-09-14</li> <li>For what it's worth, I had asked him to remove them on 2017-09-14</li>
<li>I also went and added the WLE approvers and editors groups to the appropriate steps of all the Phase I and Phase II research theme collections</li> <li>I also went and added the WLE approvers and editors groups to the appropriate steps of all the Phase I and Phase II research theme collections</li>
<li>A lot of CIAT&rsquo;s items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border</li> <li>A lot of CIAT's items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border</li>
<li>I communicated with Elizabeth from CIAT to tell her she should use DSpace&rsquo;s automatically generated thumbnails</li> <li>I communicated with Elizabeth from CIAT to tell her she should use DSpace's automatically generated thumbnails</li>
<li>Start discussiong with ICT about Linode server update for DSpace Test</li> <li>Start discussiong with ICT about Linode server update for DSpace Test</li>
<li>Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records</li> <li>Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records</li>
</ul> </ul>
<h2 id="20170928">2017-09-28</h2>
<h2 id="2017-09-28">2017-09-28</h2>
<ul> <ul>
<li>Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET</li> <li>Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET</li>
<li>Now the redirects work</li> <li>Now the redirects work</li>
<li>I quickly registered a Let's Encrypt certificate for the domain:</li>
<li><p>I quickly registered a Let&rsquo;s Encrypt certificate for the domain:</p> </ul>
<pre><code># systemctl stop nginx <pre><code># systemctl stop nginx
# /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org # /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org
# systemctl start nginx # systemctl start nginx
</code></pre></li> </code></pre><ul>
<li>I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:</li>
<li><p>I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:</p> </ul>
<pre><code>$ openssl s_client -connect cgspace.cgiar.org:443 -servername library.cgiar.org -tls1_2 -tlsextdebug -status <pre><code>$ openssl s_client -connect cgspace.cgiar.org:443 -servername library.cgiar.org -tls1_2 -tlsextdebug -status
... ...
OCSP Response Data: OCSP Response Data:
... ...
Cert Status: good Cert Status: good
</code></pre></li> </code></pre>
</ul>

View File

@ -8,14 +8,11 @@
<meta property="og:title" content="October, 2017" /> <meta property="og:title" content="October, 2017" />
<meta property="og:description" content="2017-10-01 <meta property="og:description" content="2017-10-01
Peter emailed to point out that many items in the ILRI archive collection have multiple handles: Peter emailed to point out that many items in the ILRI archive collection have multiple handles:
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I&#39;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -27,17 +24,14 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<meta name="twitter:title" content="October, 2017"/> <meta name="twitter:title" content="October, 2017"/>
<meta name="twitter:description" content="2017-10-01 <meta name="twitter:description" content="2017-10-01
Peter emailed to point out that many items in the ILRI archive collection have multiple handles: Peter emailed to point out that many items in the ILRI archive collection have multiple handles:
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I&#39;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,416 +112,308 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
</p> </p>
</header> </header>
<h2 id="2017-10-01">2017-10-01</h2> <h2 id="20171001">2017-10-01</h2>
<ul> <ul>
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p> <li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre></li>
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
</ul> </ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
<h2 id="2017-10-02">2017-10-02</h2> </code></pre><ul>
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
</ul>
<h2 id="20171002">2017-10-02</h2>
<ul> <ul>
<li>Peter Ballantyne said he was having problems logging into CGSpace with &ldquo;both&rdquo; of his accounts (CGIAR LDAP and personal, apparently)</li> <li>Peter Ballantyne said he was having problems logging into CGSpace with &ldquo;both&rdquo; of his accounts (CGIAR LDAP and personal, apparently)</li>
<li>I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a &ldquo;no DN found&rdquo; error:</li>
<li><p>I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a &ldquo;no DN found&rdquo; error:</p> </ul>
<pre><code>2017-10-01 20:24:57,928 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)] <pre><code>2017-10-01 20:24:57,928 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)]
2017-10-01 20:22:37,982 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne 2017-10-01 20:22:37,982 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne
</code></pre></li> </code></pre><ul>
<li>I thought maybe his account had expired (seeing as it's was the first of the month) but he says he was finally able to log in today</li>
<li><p>I thought maybe his account had expired (seeing as it&rsquo;s was the first of the month) but he says he was finally able to log in today</p></li> <li>The logs for yesterday show fourteen errors related to LDAP auth failures:</li>
</ul>
<li><p>The logs for yesterday show fourteen errors related to LDAP auth failures:</p>
<pre><code>$ grep -c &quot;ldap_authentication:type=failed_auth&quot; dspace.log.2017-10-01 <pre><code>$ grep -c &quot;ldap_authentication:type=failed_auth&quot; dspace.log.2017-10-01
14 14
</code></pre></li> </code></pre><ul>
<li>For what it's worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET's LDAP server</li>
<li><p>For what it&rsquo;s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET&rsquo;s LDAP server</p></li> <li>Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks</li>
<li><p>Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks</p></li>
</ul> </ul>
<h2 id="20171004">2017-10-04</h2>
<h2 id="2017-10-04">2017-10-04</h2>
<ul> <ul>
<li>Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)</li> <li>Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)</li>
<li>Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace</li> <li>Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace</li>
<li>The first is a link to a browse page that should be handled better in nginx:</li>
<li><p>The first is a link to a browse page that should be handled better in nginx:</p> </ul>
<pre><code>http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&amp;type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&amp;type=subject <pre><code>http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&amp;type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&amp;type=subject
</code></pre></li> </code></pre><ul>
<li>We'll need to check for browse links and handle them properly, including swapping the <code>subject</code> parameter for <code>systemsubject</code> (which doesn't exist in Discovery yet, but we'll need to add it) as we have moved their poorly curated subjects from <code>dc.subject</code> to <code>cg.subject.system</code></li>
<li><p>We&rsquo;ll need to check for browse links and handle them properly, including swapping the <code>subject</code> parameter for <code>systemsubject</code> (which doesn&rsquo;t exist in Discovery yet, but we&rsquo;ll need to add it) as we have moved their poorly curated subjects from <code>dc.subject</code> to <code>cg.subject.system</code></p></li> <li>The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead</li>
<li>Help Sisay proof sixty-two IITA records on DSpace Test</li>
<li><p>The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead</p></li> <li>Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries</li>
<li>Merge the Discovery search changes for ISI Journal (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
<li><p>Help Sisay proof sixty-two IITA records on DSpace Test</p></li>
<li><p>Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries</p></li>
<li><p>Merge the Discovery search changes for ISI Journal (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</p></li>
</ul> </ul>
<h2 id="20171005">2017-10-05</h2>
<h2 id="2017-10-05">2017-10-05</h2>
<ul> <ul>
<li>Twice in the past twenty-four hours Linode has warned that CGSpace&rsquo;s outbound traffic rate was exceeding the notification threshold</li> <li>Twice in the past twenty-four hours Linode has warned that CGSpace's outbound traffic rate was exceeding the notification threshold</li>
<li>I had a look at yesterday's OAI and REST logs in <code>/var/log/nginx</code> but didn't see anything unusual:</li>
<li><p>I had a look at yesterday&rsquo;s OAI and REST logs in <code>/var/log/nginx</code> but didn&rsquo;t see anything unusual:</p>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
141 157.55.39.240
145 40.77.167.85
162 66.249.66.92
181 66.249.66.95
211 66.249.66.91
312 66.249.66.94
384 66.249.66.90
1495 50.116.102.77
3904 70.32.83.92
9904 45.5.184.196
# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
5 66.249.66.71
6 66.249.66.67
6 68.180.229.31
8 41.84.227.85
8 66.249.66.92
17 66.249.66.65
24 66.249.66.91
38 66.249.66.95
69 66.249.66.90
148 66.249.66.94
</code></pre></li>
<li><p>Working on the nginx redirects for CGIAR Library</p></li>
<li><p>We should start using 301 redirects and also allow for <code>/sitemap</code> to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way</p></li>
<li><p>Remove eleven occurrences of <code>ACP</code> in IITA&rsquo;s <code>cg.coverage.region</code> using the Atmire batch edit module from Discovery</p></li>
<li><p>Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods</p></li>
<li><p>Run corrections on 143 ILRI Archive items that had two <code>dc.identifier.uri</code> values (Handle) that Peter had pointed out earlier this week</p></li>
<li><p>I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace</p></li>
<li><p>I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one</p></li>
</ul> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
<h2 id="2017-10-06">2017-10-06</h2> 141 157.55.39.240
145 40.77.167.85
162 66.249.66.92
181 66.249.66.95
211 66.249.66.91
312 66.249.66.94
384 66.249.66.90
1495 50.116.102.77
3904 70.32.83.92
9904 45.5.184.196
# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
5 66.249.66.71
6 66.249.66.67
6 68.180.229.31
8 41.84.227.85
8 66.249.66.92
17 66.249.66.65
24 66.249.66.91
38 66.249.66.95
69 66.249.66.90
148 66.249.66.94
</code></pre><ul>
<li>Working on the nginx redirects for CGIAR Library</li>
<li>We should start using 301 redirects and also allow for <code>/sitemap</code> to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way</li>
<li>Remove eleven occurrences of <code>ACP</code> in IITA's <code>cg.coverage.region</code> using the Atmire batch edit module from Discovery</li>
<li>Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods</li>
<li>Run corrections on 143 ILRI Archive items that had two <code>dc.identifier.uri</code> values (Handle) that Peter had pointed out earlier this week</li>
<li>I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace</li>
<li>I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one</li>
</ul>
<h2 id="20171006">2017-10-06</h2>
<ul> <ul>
<li>I saw a nice tweak to thumbnail presentation on the Cardiff Metropolitan University DSpace: <a href="https://repository.cardiffmet.ac.uk/handle/10369/8780">https://repository.cardiffmet.ac.uk/handle/10369/8780</a></li> <li>I saw a nice tweak to thumbnail presentation on the Cardiff Metropolitan University DSpace: <a href="https://repository.cardiffmet.ac.uk/handle/10369/8780">https://repository.cardiffmet.ac.uk/handle/10369/8780</a></li>
<li>It adds a subtle border and box shadow, before and after:</li> <li>It adds a subtle border and box shadow, before and after:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/10/dspace-thumbnail-original.png" alt="Original flat thumbnails">
<p><img src="/cgspace-notes/2017/10/dspace-thumbnail-original.png" alt="Original flat thumbnails" /> <img src="/cgspace-notes/2017/10/dspace-thumbnail-box-shadow.png" alt="Tweaked with border and box shadow"></p>
<img src="/cgspace-notes/2017/10/dspace-thumbnail-box-shadow.png" alt="Tweaked with border and box shadow" /></p>
<ul> <ul>
<li>I&rsquo;ll post it to the Yammer group to see what people think</li> <li>I'll post it to the Yammer group to see what people think</li>
<li>I figured out at way to do the HTML verification for Google Search console for library.cgiar.org</li> <li>I figured out at way to do the HTML verification for Google Search console for library.cgiar.org</li>
<li>We can drop the HTML file in their XMLUI theme folder and it will get copied to the webapps directory during build/install</li> <li>We can drop the HTML file in their XMLUI theme folder and it will get copied to the webapps directory during build/install</li>
<li>Then we add an nginx alias for that URL in the library.cgiar.org vhost</li> <li>Then we add an nginx alias for that URL in the library.cgiar.org vhost</li>
<li>This method is kinda a hack but at least we can put all the pieces into git to be reproducible</li> <li>This method is kinda a hack but at least we can put all the pieces into git to be reproducible</li>
<li>I will tell Tunji to send me the verification file</li> <li>I will tell Tunji to send me the verification file</li>
</ul> </ul>
<h2 id="20171010">2017-10-10</h2>
<h2 id="2017-10-10">2017-10-10</h2>
<ul> <ul>
<li>Deploy logic to allow verification of the library.cgiar.org domain in the Google Search Console (<a href="https://github.com/ilri/DSpace/pull/343">#343</a>)</li> <li>Deploy logic to allow verification of the library.cgiar.org domain in the Google Search Console (<a href="https://github.com/ilri/DSpace/pull/343">#343</a>)</li>
<li>After verifying both the HTTP and HTTPS domains and submitting a sitemap it will be interesting to see how the stats in the console as well as the search results change (currently 28,500 results):</li> <li>After verifying both the HTTP and HTTPS domains and submitting a sitemap it will be interesting to see how the stats in the console as well as the search results change (currently 28,500 results):</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/10/google-search-console.png" alt="Google Search Console">
<p><img src="/cgspace-notes/2017/10/google-search-console.png" alt="Google Search Console" /> <img src="/cgspace-notes/2017/10/google-search-console-2.png" alt="Google Search Console 2">
<img src="/cgspace-notes/2017/10/google-search-console-2.png" alt="Google Search Console 2" /> <img src="/cgspace-notes/2017/10/google-search-results.png" alt="Google Search results"></p>
<img src="/cgspace-notes/2017/10/google-search-results.png" alt="Google Search results" /></p>
<ul> <ul>
<li>I tried to submit a &ldquo;Change of Address&rdquo; request in the Google Search Console but I need to be an owner on CGSpace&rsquo;s console (currently I&rsquo;m just a user) in order to do that</li> <li>I tried to submit a &ldquo;Change of Address&rdquo; request in the Google Search Console but I need to be an owner on CGSpace's console (currently I'm just a user) in order to do that</li>
<li>Manually clean up some communities and collections that Peter had requested a few weeks ago</li> <li>Manually clean up some communities and collections that Peter had requested a few weeks ago</li>
<li>Delete Community <sup>10568</sup>&frasl;<sub>102</sub> (ILRI Research and Development Issues)</li> <li>Delete Community 10568/102 (ILRI Research and Development Issues)</li>
<li>Move five collections to 10568/27629 (ILRI Projects) using <code>move-collections.sh</code> with the following configuration:</li>
<li><p>Move five collections to <sup>10568</sup>&frasl;<sub>27629</sub> (ILRI Projects) using <code>move-collections.sh</code> with the following configuration:</p> </ul>
<pre><code>10568/1637 10568/174 10568/27629 <pre><code>10568/1637 10568/174 10568/27629
10568/1642 10568/174 10568/27629 10568/1642 10568/174 10568/27629
10568/1614 10568/174 10568/27629 10568/1614 10568/174 10568/27629
10568/75561 10568/150 10568/27629 10568/75561 10568/150 10568/27629
10568/183 10568/230 10568/27629 10568/183 10568/230 10568/27629
</code></pre></li> </code></pre><ul>
<li>Delete community 10568/174 (Sustainable livestock futures)</li>
<li><p>Delete community <sup>10568</sup>&frasl;<sub>174</sub> (Sustainable livestock futures)</p></li> <li>Delete collections in 10568/27629 that have zero items (33 of them!)</li>
<li><p>Delete collections in <sup>10568</sup>&frasl;<sub>27629</sub> that have zero items (33 of them!)</p></li>
</ul> </ul>
<h2 id="20171011">2017-10-11</h2>
<h2 id="2017-10-11">2017-10-11</h2>
<ul> <ul>
<li>Peter added me as an owner on the CGSpace property on Google Search Console and I tried to submit a &ldquo;Change of Address&rdquo; request for the CGIAR Library but got an error:</li> <li>Peter added me as an owner on the CGSpace property on Google Search Console and I tried to submit a &ldquo;Change of Address&rdquo; request for the CGIAR Library but got an error:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/10/search-console-change-address-error.png" alt="Change of Address error"></p>
<p><img src="/cgspace-notes/2017/10/search-console-change-address-error.png" alt="Change of Address error" /></p>
<ul> <ul>
<li>We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won&rsquo;t work—we&rsquo;ll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects</li> <li>We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won't work—we'll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects</li>
<li>Also the Google Search Console doesn&rsquo;t work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the &ldquo;Change of Address&rdquo; tool to work!</li> <li>Also the Google Search Console doesn't work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the &ldquo;Change of Address&rdquo; tool to work!</li>
</ul> </ul>
<h2 id="20171012">2017-10-12</h2>
<h2 id="2017-10-12">2017-10-12</h2>
<ul> <ul>
<li>Finally finish (I think) working on the myriad nginx redirects for all the CGIAR Library browse stuff—it ended up getting pretty complicated!</li> <li>Finally finish (I think) working on the myriad nginx redirects for all the CGIAR Library browse stuff—it ended up getting pretty complicated!</li>
<li>I still need to commit the DSpace changes (add browse index, XMLUI strings, Discovery index, etc), but I should be able to deploy that on CGSpace soon</li> <li>I still need to commit the DSpace changes (add browse index, XMLUI strings, Discovery index, etc), but I should be able to deploy that on CGSpace soon</li>
</ul> </ul>
<h2 id="20171014">2017-10-14</h2>
<h2 id="2017-10-14">2017-10-14</h2>
<ul> <ul>
<li>Run system updates on DSpace Test and reboot server</li> <li>Run system updates on DSpace Test and reboot server</li>
<li>Merge changes adding a search/browse index for CGIAR System subject to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/344">#344</a>)</li> <li>Merge changes adding a search/browse index for CGIAR System subject to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/344">#344</a>)</li>
<li>I checked the top browse links in Google&rsquo;s search results for <code>site:library.cgiar.org inurl:browse</code> and they are all redirected appropriately by the nginx rewrites I worked on last week</li> <li>I checked the top browse links in Google's search results for <code>site:library.cgiar.org inurl:browse</code> and they are all redirected appropriately by the nginx rewrites I worked on last week</li>
</ul> </ul>
<h2 id="20171022">2017-10-22</h2>
<h2 id="2017-10-22">2017-10-22</h2>
<ul> <ul>
<li>Run system updates on DSpace Test and reboot server</li> <li>Run system updates on DSpace Test and reboot server</li>
<li>Re-deploy CGSpace from latest <code>5_x-prod</code> (adds ISI Journal to search filters and adds Discovery index for CGIAR Library <code>systemsubject</code>)</li> <li>Re-deploy CGSpace from latest <code>5_x-prod</code> (adds ISI Journal to search filters and adds Discovery index for CGIAR Library <code>systemsubject</code>)</li>
<li>Deploy nginx redirect fixes to catch CGIAR Library browse links (redirect to their community and translate subject→systemsubject)</li> <li>Deploy nginx redirect fixes to catch CGIAR Library browse links (redirect to their community and translate subject→systemsubject)</li>
<li>Run migration of CGSpace server (linode18) for Linode security alert, which took 42 minutes of downtime</li> <li>Run migration of CGSpace server (linode18) for Linode security alert, which took 42 minutes of downtime</li>
</ul> </ul>
<h2 id="20171026">2017-10-26</h2>
<h2 id="2017-10-26">2017-10-26</h2>
<ul> <ul>
<li>In the last 24 hours we&rsquo;ve gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace</li> <li>In the last 24 hours we've gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace</li>
<li>Uptime Robot even noticed CGSpace go &ldquo;down&rdquo; for a few minutes</li> <li>Uptime Robot even noticed CGSpace go &ldquo;down&rdquo; for a few minutes</li>
<li>In other news, I was trying to look at a question about stats raised by Magdalena and then CGSpace went down due to SQL connection pool</li> <li>In other news, I was trying to look at a question about stats raised by Magdalena and then CGSpace went down due to SQL connection pool</li>
<li>Looking at the PostgreSQL activity I see there are 93 connections, but after a minute or two they went down and CGSpace came back up</li> <li>Looking at the PostgreSQL activity I see there are 93 connections, but after a minute or two they went down and CGSpace came back up</li>
<li>Annnd I reloaded the Atmire Usage Stats module and the connections shot back up and CGSpace went down again</li> <li>Annnd I reloaded the Atmire Usage Stats module and the connections shot back up and CGSpace went down again</li>
<li>Still not sure where the load is coming from right now, but it's clear why there were so many alerts yesterday on the 25th!</li>
<li><p>Still not sure where the load is coming from right now, but it&rsquo;s clear why there were so many alerts yesterday on the 25th!</p> </ul>
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-25 | sort -n | uniq | wc -l <pre><code># grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-25 | sort -n | uniq | wc -l
18022 18022
</code></pre></li> </code></pre><ul>
<li>Compared to other days there were two or three times the number of requests yesterday!</li>
<li><p>Compared to other days there were two or three times the number of requests yesterday!</p> </ul>
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-23 | sort -n | uniq | wc -l <pre><code># grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-23 | sort -n | uniq | wc -l
3141 3141
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-26 | sort -n | uniq | wc -l # grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-26 | sort -n | uniq | wc -l
7851 7851
</code></pre></li> </code></pre><ul>
<li>I still have no idea what was causing the load to go up today</li>
<li><p>I still have no idea what was causing the load to go up today</p></li> <li>I finally investigated Magdalena's issue with the item download stats and now I can't reproduce it: I get the same number of downloads reported in the stats widget on the item page, the &ldquo;Most Popular Items&rdquo; page, and in Usage Stats</li>
<li>I think it might have been an issue with the statistics not being fresh</li>
<li><p>I finally investigated Magdalena&rsquo;s issue with the item download stats and now I can&rsquo;t reproduce it: I get the same number of downloads reported in the stats widget on the item page, the &ldquo;Most Popular Items&rdquo; page, and in Usage Stats</p></li> <li>I added the admin group for the systems organization to the admin role of the top-level community of CGSpace because I guess Sisay had forgotten</li>
<li>Magdalena asked if there was a way to reuse data in item submissions where items have a lot of similar data</li>
<li><p>I think it might have been an issue with the statistics not being fresh</p></li> <li>I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection</li>
<li>We've never used it but it could be worth looking at</li>
<li><p>I added the admin group for the systems organization to the admin role of the top-level community of CGSpace because I guess Sisay had forgotten</p></li>
<li><p>Magdalena asked if there was a way to reuse data in item submissions where items have a lot of similar data</p></li>
<li><p>I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection</p></li>
<li><p>We&rsquo;ve never used it but it could be worth looking at</p></li>
</ul> </ul>
<h2 id="20171027">2017-10-27</h2>
<h2 id="2017-10-27">2017-10-27</h2>
<ul> <ul>
<li>Linode alerted about high CPU usage again (twice) on CGSpace in the last 24 hours, around 2AM and 2PM</li> <li>Linode alerted about high CPU usage again (twice) on CGSpace in the last 24 hours, around 2AM and 2PM</li>
</ul> </ul>
<h2 id="20171028">2017-10-28</h2>
<h2 id="2017-10-28">2017-10-28</h2>
<ul> <ul>
<li>Linode alerted about high CPU usage again on CGSpace around 2AM this morning</li> <li>Linode alerted about high CPU usage again on CGSpace around 2AM this morning</li>
</ul> </ul>
<h2 id="20171029">2017-10-29</h2>
<h2 id="2017-10-29">2017-10-29</h2>
<ul> <ul>
<li>Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM</li> <li>Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM</li>
<li>I&rsquo;m still not sure why this started causing alerts so repeatadely the past week</li> <li>I'm still not sure why this started causing alerts so repeatadely the past week</li>
<li>I don't see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:</li>
<li><p>I don&rsquo;t see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:</p> </ul>
<pre><code># grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l <pre><code># grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
2049 2049
</code></pre></li> </code></pre><ul>
<li>So there were 2049 unique sessions during the hour of 2AM</li>
<li><p>So there were 2049 unique sessions during the hour of 2AM</p></li> <li>Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts</li>
<li>I think I'll need to enable access logging in nginx to figure out what's going on</li>
<li><p>Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts</p></li> <li>After enabling logging on requests to XMLUI on <code>/</code> I see some new bot I've never seen before:</li>
<li><p>I think I&rsquo;ll need to enable access logging in nginx to figure out what&rsquo;s going on</p></li>
<li><p>After enabling logging on requests to XMLUI on <code>/</code> I see some new bot I&rsquo;ve never seen before:</p>
<pre><code>137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] &quot;GET /discover?filtertype_0=type&amp;filter_relational_operator_0=equals&amp;filter_0=Internal+Document&amp;filtertype=author&amp;filter_relational_operator=equals&amp;filter=CGIAR+Secretariat HTTP/1.1&quot; 200 7776 &quot;-&quot; &quot;Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)&quot;
</code></pre></li>
<li><p>CORE seems to be some bot that is &ldquo;Aggregating the worlds open access research papers&rdquo;</p></li>
<li><p>The contact address listed in their bot&rsquo;s user agent is incorrect, correct page is simply: <a href="https://core.ac.uk/contact">https://core.ac.uk/contact</a></p></li>
<li><p>I will check the logs in a few days to see if they are harvesting us regularly, then add their bot&rsquo;s user agent to the Tomcat Crawler Session Valve</p></li>
<li><p>After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now</p></li>
<li><p>For now I will just contact them to have them update their contact info in the bot&rsquo;s user agent, but eventually I think I&rsquo;ll tell them to swap out the CGIAR Library entry for CGSpace</p></li>
</ul> </ul>
<pre><code>137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] &quot;GET /discover?filtertype_0=type&amp;filter_relational_operator_0=equals&amp;filter_0=Internal+Document&amp;filtertype=author&amp;filter_relational_operator=equals&amp;filter=CGIAR+Secretariat HTTP/1.1&quot; 200 7776 &quot;-&quot; &quot;Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)&quot;
<h2 id="2017-10-30">2017-10-30</h2> </code></pre><ul>
<li>CORE seems to be some bot that is &ldquo;Aggregating the worlds open access research papers&rdquo;</li>
<li>The contact address listed in their bot's user agent is incorrect, correct page is simply: <a href="https://core.ac.uk/contact">https://core.ac.uk/contact</a></li>
<li>I will check the logs in a few days to see if they are harvesting us regularly, then add their bot's user agent to the Tomcat Crawler Session Valve</li>
<li>After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now</li>
<li>For now I will just contact them to have them update their contact info in the bot's user agent, but eventually I think I'll tell them to swap out the CGIAR Library entry for CGSpace</li>
</ul>
<h2 id="20171030">2017-10-30</h2>
<ul> <ul>
<li>Like clock work, Linode alerted about high CPU usage on CGSpace again this morning (this time at 8:13 AM)</li> <li>Like clock work, Linode alerted about high CPU usage on CGSpace again this morning (this time at 8:13 AM)</li>
<li>Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:</li>
<li><p>Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:</p> </ul>
<pre><code>dspace=# SELECT * FROM pg_stat_activity; <pre><code>dspace=# SELECT * FROM pg_stat_activity;
... ...
(93 rows) (93 rows)
</code></pre></li> </code></pre><ul>
<li>Surprise surprise, the CORE bot is likely responsible for the recent load issues, making hundreds of thousands of requests yesterday and today:</li>
<li><p>Surprise surprise, the CORE bot is likely responsible for the recent load issues, making hundreds of thousands of requests yesterday and today:</p> </ul>
<pre><code># grep -c &quot;CORE/0.6&quot; /var/log/nginx/access.log <pre><code># grep -c &quot;CORE/0.6&quot; /var/log/nginx/access.log
26475 26475
# grep -c &quot;CORE/0.6&quot; /var/log/nginx/access.log.1 # grep -c &quot;CORE/0.6&quot; /var/log/nginx/access.log.1
135083 135083
</code></pre></li> </code></pre><ul>
<li>IP addresses for this bot currently seem to be:</li>
<li><p>IP addresses for this bot currently seem to be:</p> </ul>
<pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq <pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq
137.108.70.6 137.108.70.6
137.108.70.7 137.108.70.7
</code></pre></li> </code></pre><ul>
<li>I will add their user agent to the Tomcat Session Crawler Valve but it won't help much because they are only using two sessions:</li>
<li><p>I will add their user agent to the Tomcat Session Crawler Valve but it won&rsquo;t help much because they are only using two sessions:</p> </ul>
<pre><code># grep 137.108.70 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq <pre><code># grep 137.108.70 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq
session_id=5771742CABA3D0780860B8DA81E0551B session_id=5771742CABA3D0780860B8DA81E0551B
session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
</code></pre></li> </code></pre><ul>
<li>&hellip; and most of their requests are for dynamic discover pages:</li>
<li><p>&hellip; and most of their requests are for dynamic discover pages:</p> </ul>
<pre><code># grep -c 137.108.70 /var/log/nginx/access.log <pre><code># grep -c 137.108.70 /var/log/nginx/access.log
26622 26622
# grep 137.108.70 /var/log/nginx/access.log | grep -c &quot;GET /discover&quot; # grep 137.108.70 /var/log/nginx/access.log | grep -c &quot;GET /discover&quot;
24055 24055
</code></pre></li> </code></pre><ul>
<li>Just because I'm curious who the top IPs are:</li>
<li><p>Just because I&rsquo;m curious who the top IPs are:</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail <pre><code># awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail
496 62.210.247.93 496 62.210.247.93
571 46.4.94.226 571 46.4.94.226
651 40.77.167.39 651 40.77.167.39
763 157.55.39.231 763 157.55.39.231
782 207.46.13.90 782 207.46.13.90
998 66.249.66.90 998 66.249.66.90
1948 104.196.152.243 1948 104.196.152.243
4247 190.19.92.5 4247 190.19.92.5
31602 137.108.70.6 31602 137.108.70.6
31636 137.108.70.7 31636 137.108.70.7
</code></pre></li> </code></pre><ul>
<li>At least we know the top two are CORE, but who are the others?</li>
<li><p>At least we know the top two are CORE, but who are the others?</p></li> <li>190.19.92.5 is apparently in Argentina, and 104.196.152.243 is from Google Cloud Engine</li>
<li>Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don't reuse their session variable, creating thousands of new sessions!</li>
<li><p>190.19.92.5 is apparently in Argentina, and 104.196.152.243 is from Google Cloud Engine</p></li> </ul>
<li><p>Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don&rsquo;t reuse their session variable, creating thousands of new sessions!</p>
<pre><code># grep 190.19.92.5 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l <pre><code># grep 190.19.92.5 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
1419 1419
# grep 104.196.152.243 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l # grep 104.196.152.243 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
2811 2811
</code></pre></li> </code></pre><ul>
<li>From looking at the requests, it appears these are from CIAT and CCAFS</li>
<li><p>From looking at the requests, it appears these are from CIAT and CCAFS</p></li> <li>I wonder if I could somehow instruct them to use a user agent so that we could apply a crawler session manager valve to them</li>
<li>Actually, according to the Tomcat docs, we could use an IP with <code>crawlerIps</code>: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve</a></li>
<li><p>I wonder if I could somehow instruct them to use a user agent so that we could apply a crawler session manager valve to them</p></li> <li>Ah, wait, it looks like <code>crawlerIps</code> only came in 2017-06, so probably isn't in Ubuntu 16.04's 7.0.68 build!</li>
<li>That would explain the errors I was getting when trying to set it:</li>
<li><p>Actually, according to the Tomcat docs, we could use an IP with <code>crawlerIps</code>: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve</a></p></li>
<li><p>Ah, wait, it looks like <code>crawlerIps</code> only came in 2017-06, so probably isn&rsquo;t in Ubuntu 16.04&rsquo;s 7.0.68 build!</p></li>
<li><p>That would explain the errors I was getting when trying to set it:</p>
<pre><code>WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property 'crawlerIps' to '190\.19\.92\.5|104\.196\.152\.243' did not find a matching property.
</code></pre></li>
<li><p>As for now, it actually seems the CORE bot coming from 137.108.70.6 and 137.108.70.7 is only using a few sessions per day, which is good:</p>
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=137.108.70.(6|7)' dspace.log.2017-10-30 | sort -n | uniq -c | sort -h
410 session_id=74F0C3A133DBF1132E7EC30A7E7E0D60:ip_addr=137.108.70.7
574 session_id=5771742CABA3D0780860B8DA81E0551B:ip_addr=137.108.70.7
1012 session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A:ip_addr=137.108.70.6
</code></pre></li>
<li><p>I will check again tomorrow</p></li>
</ul> </ul>
<pre><code>WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property 'crawlerIps' to '190\.19\.92\.5|104\.196\.152\.243' did not find a matching property.
<h2 id="2017-10-31">2017-10-31</h2> </code></pre><ul>
<li>As for now, it actually seems the CORE bot coming from 137.108.70.6 and 137.108.70.7 is only using a few sessions per day, which is good:</li>
</ul>
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=137.108.70.(6|7)' dspace.log.2017-10-30 | sort -n | uniq -c | sort -h
410 session_id=74F0C3A133DBF1132E7EC30A7E7E0D60:ip_addr=137.108.70.7
574 session_id=5771742CABA3D0780860B8DA81E0551B:ip_addr=137.108.70.7
1012 session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A:ip_addr=137.108.70.6
</code></pre><ul>
<li>I will check again tomorrow</li>
</ul>
<h2 id="20171031">2017-10-31</h2>
<ul> <ul>
<li>Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again</li> <li>Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again</li>
<li>Ask on the dspace-tech mailing list if it&rsquo;s possible to use an existing item as a template for a new item</li> <li>Ask on the dspace-tech mailing list if it's possible to use an existing item as a template for a new item</li>
<li>To follow up on the CORE bot traffic, there were almost 300,000 request yesterday:</li>
<li><p>To follow up on the CORE bot traffic, there were almost 300,000 request yesterday:</p> </ul>
<pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log.1 | awk '{print $1}' | sort -n | uniq -c | sort -h <pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log.1 | awk '{print $1}' | sort -n | uniq -c | sort -h
139109 137.108.70.6 139109 137.108.70.6
139253 137.108.70.7 139253 137.108.70.7
</code></pre></li> </code></pre><ul>
<li>I've emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace</li>
<li><p>I&rsquo;ve emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace</p></li> <li>Also, I asked if they could perhaps use the <code>sitemap.xml</code>, OAI-PMH, or REST APIs to index us more efficiently, because they mostly seem to be crawling the nearly endless Discovery facets</li>
<li>I added <a href="https://goaccess.io/">GoAccess</a> to the list of package to install in the DSpace role of the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li>
<li><p>Also, I asked if they could perhaps use the <code>sitemap.xml</code>, OAI-PMH, or REST APIs to index us more efficiently, because they mostly seem to be crawling the nearly endless Discovery facets</p></li> <li>It makes it very easy to analyze nginx logs from the command line, to see where traffic is coming from:</li>
</ul>
<li><p>I added <a href="https://goaccess.io/">GoAccess</a> to the list of package to install in the DSpace role of the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></p></li>
<li><p>It makes it very easy to analyze nginx logs from the command line, to see where traffic is coming from:</p>
<pre><code># goaccess /var/log/nginx/access.log --log-format=COMBINED <pre><code># goaccess /var/log/nginx/access.log --log-format=COMBINED
</code></pre></li> </code></pre><ul>
<li>According to Uptime Robot CGSpace went down and up a few times</li>
<li><p>According to Uptime Robot CGSpace went down and up a few times</p></li> <li>I had a look at goaccess and I saw that CORE was actively indexing</li>
<li>Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)</li>
<li><p>I had a look at goaccess and I saw that CORE was actively indexing</p></li> <li>I'm really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable</li>
<li>Actually, come to think of it, they aren't even obeying <code>robots.txt</code>, because we actually disallow <code>/discover</code> and <code>/search-filter</code> URLs but they are hitting those massively:</li>
<li><p>Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)</p></li> </ul>
<li><p>I&rsquo;m really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable</p></li>
<li><p>Actually, come to think of it, they aren&rsquo;t even obeying <code>robots.txt</code>, because we actually disallow <code>/discover</code> and <code>/search-filter</code> URLs but they are hitting those massively:</p>
<pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log | grep -o -E &quot;GET /(discover|search-filter)&quot; | sort -n | uniq -c | sort -rn <pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log | grep -o -E &quot;GET /(discover|search-filter)&quot; | sort -n | uniq -c | sort -rn
158058 GET /discover 158058 GET /discover
14260 GET /search-filter 14260 GET /search-filter
</code></pre></li> </code></pre><ul>
<li>I tested a URL of pattern <code>/discover</code> in Google's webmaster tools and it was indeed identified as blocked</li>
<li><p>I tested a URL of pattern <code>/discover</code> in Google&rsquo;s webmaster tools and it was indeed identified as blocked</p></li> <li>I will send feedback to the CORE bot team</li>
<li><p>I will send feedback to the CORE bot team</p></li>
</ul> </ul>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -8,7 +8,6 @@
<meta property="og:title" content="March, 2018" /> <meta property="og:title" content="March, 2018" />
<meta property="og:description" content="2018-03-02 <meta property="og:description" content="2018-03-02
Export a CSV of the IITA community metadata for Martin Mueller Export a CSV of the IITA community metadata for Martin Mueller
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -20,10 +19,9 @@ Export a CSV of the IITA community metadata for Martin Mueller
<meta name="twitter:title" content="March, 2018"/> <meta name="twitter:title" content="March, 2018"/>
<meta name="twitter:description" content="2018-03-02 <meta name="twitter:description" content="2018-03-02
Export a CSV of the IITA community metadata for Martin Mueller Export a CSV of the IITA community metadata for Martin Mueller
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -104,169 +102,132 @@ Export a CSV of the IITA community metadata for Martin Mueller
</p> </p>
</header> </header>
<h2 id="2018-03-02">2018-03-02</h2> <h2 id="20180302">2018-03-02</h2>
<ul> <ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li> <li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul> </ul>
<h2 id="20180306">2018-03-06</h2>
<h2 id="2018-03-06">2018-03-06</h2>
<ul> <ul>
<li>Add three new CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/357">#357</a>)</li> <li>Add three new CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/357">#357</a>)</li>
<li>Andrea from Macaroni Bros had sent me an email that CCAFS needs them</li> <li>Andrea from Macaroni Bros had sent me an email that CCAFS needs them</li>
<li>Give Udana more feedback on his WLE records from last month</li> <li>Give Udana more feedback on his WLE records from last month</li>
<li>There were some records using a non-breaking space in their AGROVOC subject field</li> <li>There were some records using a non-breaking space in their AGROVOC subject field</li>
<li>I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace</li>
<li><p>I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 <pre><code>$ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
$ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3 $ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3
</code></pre></li> </code></pre><ul>
<li>This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character</li>
<li><p>This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character</p></li> <li>Add new CRP subject &ldquo;GRAIN LEGUMES AND DRYLAND CEREALS&rdquo; to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/358">#358</a>)</li>
<li>Merge the ORCID integration stuff in to <code>5_x-prod</code> for deployment on CGSpace soon (<a href="https://github.com/ilri/DSpace/pull/359">#359</a>)</li>
<li><p>Add new CRP subject &ldquo;GRAIN LEGUMES AND DRYLAND CEREALS&rdquo; to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/358">#358</a>)</p></li> <li>Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server</li>
<li>Run all system updates on DSpace Test and reboot server</li>
<li><p>Merge the ORCID integration stuff in to <code>5_x-prod</code> for deployment on CGSpace soon (<a href="https://github.com/ilri/DSpace/pull/359">#359</a>)</p></li> <li>I ran the <a href="https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c">orcid-authority-to-item.py</a> script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata</li>
</ul>
<li><p>Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server</p></li>
<li><p>Run all system updates on DSpace Test and reboot server</p></li>
<li><p>I ran the <a href="https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c">orcid-authority-to-item.py</a> script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata</p>
<pre><code>$ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d <pre><code>$ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d
</code></pre></li> </code></pre><ul>
<li>I ran the DSpace cleanup script on CGSpace and it threw an error (as always):</li>
<li><p>I ran the DSpace cleanup script on CGSpace and it threw an error (as always):</p> </ul>
<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; <pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(150659) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(150659) is still referenced from table &quot;bundle&quot;.
</code></pre></li> </code></pre><ul>
<li>The solution is, as always:</li>
<li><p>The solution is, as always:</p> </ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);' <pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);'
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><ul>
<li>Apply the proposed PostgreSQL indexes from DS-3636 (pull request <a href="https://github.com/DSpace/DSpace/pull/1791/">#1791</a> on CGSpace (linode18)</li>
<li><p>Apply the proposed PostgreSQL indexes from DS-3636 (pull request <a href="https://github.com/DSpace/DSpace/pull/1791/">#1791</a> on CGSpace (linode18)</p></li>
</ul> </ul>
<h2 id="20180307">2018-03-07</h2>
<h2 id="2018-03-07">2018-03-07</h2>
<ul> <ul>
<li>Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/360">#360</a>)</li> <li>Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/360">#360</a>)</li>
<li>Help Sisay proof 200 IITA records on DSpace Test</li> <li>Help Sisay proof 200 IITA records on DSpace Test</li>
<li>Finally import Udana&rsquo;s 24 items to <a href="https://cgspace.cgiar.org/handle/10568/36185">IWMI Journal Articles</a> on CGSpace</li> <li>Finally import Udana's 24 items to <a href="https://cgspace.cgiar.org/handle/10568/36185">IWMI Journal Articles</a> on CGSpace</li>
<li>Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc</li> <li>Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc</li>
</ul> </ul>
<h2 id="20180308">2018-03-08</h2>
<h2 id="2018-03-08">2018-03-08</h2>
<ul> <ul>
<li>Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata</li> <li>Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata</li>
<li>This makes the CSV have tons of columns, for example <code>dc.title</code>, <code>dc.title[]</code>, <code>dc.title[en]</code>, <code>dc.title[eng]</code>, <code>dc.title[en_US]</code> and so on!</li> <li>This makes the CSV have tons of columns, for example <code>dc.title</code>, <code>dc.title[]</code>, <code>dc.title[en]</code>, <code>dc.title[eng]</code>, <code>dc.title[en_US]</code> and so on!</li>
<li>I think I can fixor at least normalizethem in the database:</li>
<li><p>I think I can fixor at least normalizethem in the database:</p> </ul>
<pre><code>dspace=# select distinct text_lang from metadatavalue where resource_type_id=2; <pre><code>dspace=# select distinct text_lang from metadatavalue where resource_type_id=2;
text_lang text_lang
----------- -----------
ethnob ethnob
en en
spa spa
EN EN
En En
en_ en_
en_US en_US
E. E.
EN_US EN_US
en_U en_U
eng eng
fr fr
es_ES es_ES
es es
(16 rows) (16 rows)
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng'); dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng');
UPDATE 122227 UPDATE 122227
dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2; dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
text_lang text_lang
----------- -----------
ethnob ethnob
en_US en_US
spa spa
E. E.
fr fr
es_ES es_ES
es es
(9 rows) (9 rows)
</code></pre></li> </code></pre><ul>
<li>On second inspection it looks like <code>dc.description.provenance</code> fields use the text_lang &ldquo;en&rdquo; so that's probably why there are over 100,000 fields changed&hellip;</li>
<li><p>On second inspection it looks like <code>dc.description.provenance</code> fields use the text_lang &ldquo;en&rdquo; so that&rsquo;s probably why there are over 100,000 fields changed&hellip;</p></li> <li>If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:</li>
</ul>
<li><p>If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:</p>
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng'); <pre><code>dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
UPDATE 2309 UPDATE 2309
</code></pre></li> </code></pre><ul>
<li>I will apply this on CGSpace right now</li>
<li><p>I will apply this on CGSpace right now</p></li> <li>In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine</li>
<li>Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the <code>cg.creator.id</code> field</li>
<li><p>In other news, I was playing with adding ORCID identifiers to a dump of CIAT&rsquo;s community via CSV in OpenRefine</p></li> <li>For example, a GREL expression in a custom text facet to get all items with <code>dc.contributor.author[en_US]</code> of a certain author with several name variations (this is how you use a logical OR in OpenRefine):</li>
</ul>
<li><p>Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the <code>cg.creator.id</code> field</p></li>
<li><p>For example, a GREL expression in a custom text facet to get all items with <code>dc.contributor.author[en_US]</code> of a certain author with several name variations (this is how you use a logical OR in OpenRefine):</p>
<pre><code>or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos')) <pre><code>or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos'))
</code></pre></li> </code></pre><ul>
<li>Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:</li>
<li><p>Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:</p> </ul>
<pre><code>if(isBlank(value), &quot;Hernan Ceballos: 0000-0002-8744-7918&quot;, value + &quot;||Hernan Ceballos: 0000-0002-8744-7918&quot;) <pre><code>if(isBlank(value), &quot;Hernan Ceballos: 0000-0002-8744-7918&quot;, value + &quot;||Hernan Ceballos: 0000-0002-8744-7918&quot;)
</code></pre></li> </code></pre><ul>
<li>One thing that bothers me is that this won't honor author order</li>
<li><p>One thing that bothers me is that this won&rsquo;t honor author order</p></li> <li>It might be better to do batches of these in PostgreSQL with a script that takes the <code>place</code> column of an author into account when setting the <code>cg.creator.id</code></li>
<li>I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching <code>cg.creator.id</code> fields: <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py </a></li>
<li><p>It might be better to do batches of these in PostgreSQL with a script that takes the <code>place</code> column of an author into account when setting the <code>cg.creator.id</code></p></li> <li>The CSV should have two columns: author name and ORCID identifier:</li>
</ul>
<li><p>I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching <code>cg.creator.id</code> fields: <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py </a></p></li>
<li><p>The CSV should have two columns: author name and ORCID identifier:</p>
<pre><code>dc.contributor.author,cg.creator.id <pre><code>dc.contributor.author,cg.creator.id
&quot;Orth, Alan&quot;,Alan S. Orth: 0000-0002-1735-7458 &quot;Orth, Alan&quot;,Alan S. Orth: 0000-0002-1735-7458
&quot;Orth, A.&quot;,Alan S. Orth: 0000-0002-1735-7458 &quot;Orth, A.&quot;,Alan S. Orth: 0000-0002-1735-7458
</code></pre></li> </code></pre><ul>
<li>I didn't integrate the ORCID API lookup for author names in this script for now because I was only interested in &ldquo;tagging&rdquo; old items for a few given authors</li>
<li><p>I didn&rsquo;t integrate the ORCID API lookup for author names in this script for now because I was only interested in &ldquo;tagging&rdquo; old items for a few given authors</p></li> <li>I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually!</li>
<li>Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well</li>
<li><p>I added ORCID identifers for 187 items by CIAT&rsquo;s Hernan Ceballos, because that is what Elizabeth was trying to do manually!</p></li>
<li><p>Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well</p></li>
</ul> </ul>
<h2 id="20180309">2018-03-09</h2>
<h2 id="2018-03-09">2018-03-09</h2>
<ul> <ul>
<li>Give James Stapleton input on Sisay&rsquo;s KRAs</li> <li>Give James Stapleton input on Sisay's KRAs</li>
<li>Create a pull request to disable ORCID authority integration for <code>dc.contributor.author</code> in the submission forms and XMLUI display (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</li> <li>Create a pull request to disable ORCID authority integration for <code>dc.contributor.author</code> in the submission forms and XMLUI display (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</li>
</ul> </ul>
<h2 id="20180311">2018-03-11</h2>
<h2 id="2018-03-11">2018-03-11</h2>
<ul> <ul>
<li>Peter also wrote to say he is having issues with the Atmire Listings and Reports module</li> <li>Peter also wrote to say he is having issues with the Atmire Listings and Reports module</li>
<li>When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:</li>
<li><p>When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:</p> </ul>
<pre><code>2018-03-11 11:38:15,592 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or <pre><code>2018-03-11 11:38:15,592 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or
g/jspui/listings-and-reports g/jspui/listings-and-reports
-- Method: POST -- Method: POST
@ -277,21 +238,15 @@ g/jspui/listings-and-reports
-- step: &quot;1&quot; -- step: &quot;1&quot;
org.apache.jasper.JasperException: java.lang.NullPointerException org.apache.jasper.JasperException: java.lang.NullPointerException
</code></pre></li> </code></pre><ul>
<li>Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them</li>
<li><p>Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn&rsquo;t find them</p></li> <li>I made a quick fix and it's working now (<a href="https://github.com/ilri/DSpace/pull/364">#364</a>)</li>
<li><p>I made a quick fix and it&rsquo;s working now (<a href="https://github.com/ilri/DSpace/pull/364">#364</a>)</p></li>
</ul> </ul>
<h2 id="20180312">2018-03-12</h2>
<h2 id="2018-03-12">2018-03-12</h2>
<ul> <ul>
<li>Increase upload size on CGSpace&rsquo;s nginx config to 85MB so Sisay can upload some data</li> <li>Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data</li>
</ul> </ul>
<h2 id="20180313">2018-03-13</h2>
<h2 id="2018-03-13">2018-03-13</h2>
<ul> <ul>
<li>I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)</li> <li>I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)</li>
<li>I deleted the Linode and created another one (linode6624164) in the Fremont, CA region</li> <li>I deleted the Linode and created another one (linode6624164) in the Fremont, CA region</li>
@ -303,82 +258,62 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
<li>CCAFS publication page: <a href="https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy">https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy</a></li> <li>CCAFS publication page: <a href="https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy">https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy</a></li>
<li>Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle</li> <li>Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle</li>
</ul> </ul>
<h2 id="20180314">2018-03-14</h2>
<h2 id="2018-03-14">2018-03-14</h2>
<ul> <ul>
<li>Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe</li> <li>Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe</li>
<li>Continue migrating DSpace Test to the new server (linode6624164)</li> <li>Continue migrating DSpace Test to the new server (linode6624164)</li>
<li>I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org</li> <li>I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org</li>
<li>Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works</li> <li>Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works</li>
</ul> </ul>
<h2 id="20180315">2018-03-15</h2>
<h2 id="2018-03-15">2018-03-15</h2>
<ul> <ul>
<li>Help Abenet troubleshoot the Listings and Reports issue again</li> <li>Help Abenet troubleshoot the Listings and Reports issue again</li>
<li>It looks like it&rsquo;s an issue with the layouts, if you create a new layout that only has one type (<code>dc.identifier.citation</code>):</li> <li>It looks like it's an issue with the layouts, if you create a new layout that only has one type (<code>dc.identifier.citation</code>):</li>
</ul> </ul>
<p><img src="/cgspace-notes/2018/03/layout-only-citation.png" alt="Listing and Reports layout"></p>
<p><img src="/cgspace-notes/2018/03/layout-only-citation.png" alt="Listing and Reports layout" /></p>
<ul> <ul>
<li><p>The error in the DSpace log is:</p> <li>The error in the DSpace log is:</li>
<pre><code>org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException: -1
</code></pre></li>
<li><p>The full error is here: <a href="https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca">https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca</a></p></li>
<li><p>If I do a report for &ldquo;Orth, Alan&rdquo; with the same custom layout it works!</p></li>
<li><p>I submitted a ticket to Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589</a></p></li>
<li><p>Small fix to the example citation text in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/365">#365</a>)</p></li>
</ul> </ul>
<pre><code>org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException: -1
<h2 id="2018-03-16">2018-03-16</h2> </code></pre><ul>
<li>The full error is here: <a href="https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca">https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca</a></li>
<li>If I do a report for &ldquo;Orth, Alan&rdquo; with the same custom layout it works!</li>
<li>I submitted a ticket to Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589</a></li>
<li>Small fix to the example citation text in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/365">#365</a>)</li>
</ul>
<h2 id="20180316">2018-03-16</h2>
<ul> <ul>
<li>ICT made the DNS updates for dspacetest.cgiar.org late last night</li> <li>ICT made the DNS updates for dspacetest.cgiar.org late last night</li>
<li>I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164</li> <li>I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164</li>
<li>Looking at the CRP subjects on CGSpace I see there is one blank one so I'll just fix it:</li>
<li><p>Looking at the CRP subjects on CGSpace I see there is one blank one so I&rsquo;ll just fix it:</p> </ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=230 and text_value=''; <pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=230 and text_value='';
</code></pre></li> </code></pre><ul>
<li>Copy all CRP subjects to a CSV to do the mass updates:</li>
<li><p>Copy all CRP subjects to a CSV to do the mass updates:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=230 group by text_value order by count desc) to /tmp/crps.csv with csv header; <pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=230 group by text_value order by count desc) to /tmp/crps.csv with csv header;
COPY 21 COPY 21
</code></pre></li> </code></pre><ul>
<li>Once I prepare the new input forms (<a href="https://github.com/ilri/DSpace/issues/362">#362</a>) I will need to do the batch corrections:</li>
<li><p>Once I prepare the new input forms (<a href="https://github.com/ilri/DSpace/issues/362">#362</a>) I will need to do the batch corrections:</p>
<pre><code>$ ./fix-metadata-values.py -i Correct-21-CRPs-2018-03-16.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.crp -t correct -m 230 -n -d
</code></pre></li>
<li><p>Create a pull request to update the input forms for the new CRP subject style (<a href="https://github.com/ilri/DSpace/pull/366">#366</a>)</p></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i Correct-21-CRPs-2018-03-16.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.crp -t correct -m 230 -n -d
<h2 id="2018-03-19">2018-03-19</h2> </code></pre><ul>
<li>Create a pull request to update the input forms for the new CRP subject style (<a href="https://github.com/ilri/DSpace/pull/366">#366</a>)</li>
</ul>
<h2 id="20180319">2018-03-19</h2>
<ul> <ul>
<li>Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week</li> <li>Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week</li>
<li>She is getting an HTTPS error apparently</li> <li>She is getting an HTTPS error apparently</li>
<li>It&rsquo;s working outside, and Ethiopian users seem to be having no issues so I&rsquo;ve asked ICT to have a look</li> <li>It's working outside, and Ethiopian users seem to be having no issues so I've asked ICT to have a look</li>
<li>CGSpace crashed this morning for about seven minutes and Dani restarted Tomcat</li> <li>CGSpace crashed this morning for about seven minutes and Dani restarted Tomcat</li>
<li>Around that time there were an increase of SQL errors:</li>
<li><p>Around that time there were an increase of SQL errors:</p> </ul>
<pre><code>2018-03-19 09:10:54,856 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error - <pre><code>2018-03-19 09:10:54,856 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
... ...
2018-03-19 09:10:54,862 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error - 2018-03-19 09:10:54,862 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error -
</code></pre></li> </code></pre><ul>
<li>But these errors, I don't even know what they mean, because a handful of them happen every day:</li>
<li><p>But these errors, I don&rsquo;t even know what they mean, because a handful of them happen every day:</p> </ul>
<pre><code>$ grep -c 'ERROR org.dspace.storage.rdbms.DatabaseManager' dspace.log.2018-03-1* <pre><code>$ grep -c 'ERROR org.dspace.storage.rdbms.DatabaseManager' dspace.log.2018-03-1*
dspace.log.2018-03-10:13 dspace.log.2018-03-10:13
dspace.log.2018-03-11:15 dspace.log.2018-03-11:15
@ -390,287 +325,220 @@ dspace.log.2018-03-16:13
dspace.log.2018-03-17:13 dspace.log.2018-03-17:13
dspace.log.2018-03-18:15 dspace.log.2018-03-18:15
dspace.log.2018-03-19:90 dspace.log.2018-03-19:90
</code></pre></li> </code></pre><ul>
<li>There wasn't even a lot of traffic at the time (89 AM):</li>
<li><p>There wasn&rsquo;t even a lot of traffic at the time (89 AM):</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;19/Mar/2018:0[89]:&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;19/Mar/2018:0[89]:&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.197 92 40.77.167.197
92 83.103.94.48 92 83.103.94.48
96 40.77.167.175 96 40.77.167.175
116 207.46.13.178 116 207.46.13.178
122 66.249.66.153 122 66.249.66.153
140 95.108.181.88 140 95.108.181.88
196 213.55.99.121 196 213.55.99.121
206 197.210.168.174 206 197.210.168.174
207 104.196.152.243 207 104.196.152.243
294 54.198.169.202 294 54.198.169.202
</code></pre></li> </code></pre><ul>
<li>Well there is a hint in Tomcat's <code>catalina.out</code>:</li>
<li><p>Well there is a hint in Tomcat&rsquo;s <code>catalina.out</code>:</p> </ul>
<pre><code>Mon Mar 19 09:05:28 UTC 2018 | Query:id: 92032 AND type:2 <pre><code>Mon Mar 19 09:05:28 UTC 2018 | Query:id: 92032 AND type:2
Exception in thread &quot;http-bio-127.0.0.1-8081-exec-280&quot; java.lang.OutOfMemoryError: Java heap space Exception in thread &quot;http-bio-127.0.0.1-8081-exec-280&quot; java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>So someone was doing something heavy somehow&hellip; my guess is content and usage stats!</li>
<li><p>So someone was doing something heavy somehow&hellip; my guess is content and usage stats!</p></li> <li>ICT responded that they &ldquo;fixed&rdquo; the CGSpace connectivity issue in Nairobi without telling me the problem</li>
<li>When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week</li>
<li><p>ICT responded that they &ldquo;fixed&rdquo; the CGSpace connectivity issue in Nairobi without telling me the problem</p></li> <li>I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!</li>
<li>So they updated the wrong fucking DNS records</li>
<li><p>When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week</p></li> <li>Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export</li>
<li>It appears to be this one: <a href="https://cgspace.cgiar.org/handle/10568/83473?show=full">https://cgspace.cgiar.org/handle/10568/83473?show=full</a></li>
<li><p>I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!</p></li> <li>The title is &ldquo;Untitled&rdquo; and there is some metadata but indeed the citation is missing</li>
<li>I don't know what would cause that</li>
<li><p>So they updated the wrong fucking DNS records</p></li>
<li><p>Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export</p></li>
<li><p>It appears to be this one: <a href="https://cgspace.cgiar.org/handle/10568/83473?show=full">https://cgspace.cgiar.org/handle/10568/83473?show=full</a></p></li>
<li><p>The title is &ldquo;Untitled&rdquo; and there is some metadata but indeed the citation is missing</p></li>
<li><p>I don&rsquo;t know what would cause that</p></li>
</ul> </ul>
<h2 id="20180320">2018-03-20</h2>
<h2 id="2018-03-20">2018-03-20</h2>
<ul> <ul>
<li><p>DSpace Test has been down for a few hours with SQL and memory errors starting this morning:</p> <li>DSpace Test has been down for a few hours with SQL and memory errors starting this morning:</li>
</ul>
<pre><code>2018-03-20 08:47:10,177 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error - <pre><code>2018-03-20 08:47:10,177 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
... ...
2018-03-20 08:53:11,624 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! 2018-03-20 08:53:11,624 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>I have no idea why it crashed</li>
<li><p>I have no idea why it crashed</p></li> <li>I ran all system updates and rebooted it</li>
<li>Abenet told me that one of Lance Robinson's ORCID iDs on CGSpace is incorrect</li>
<li><p>I ran all system updates and rebooted it</p></li> <li>I will remove it from the controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/367">#367</a>) and update any items using the old one:</li>
</ul>
<li><p>Abenet told me that one of Lance Robinson&rsquo;s ORCID iDs on CGSpace is incorrect</p></li>
<li><p>I will remove it from the controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/367">#367</a>) and update any items using the old one:</p>
<pre><code>dspace=# update metadatavalue set text_value='Lance W. Robinson: 0000-0002-5224-8644' where resource_type_id=2 and metadata_field_id=240 and text_value like '%0000-0002-6344-195X%'; <pre><code>dspace=# update metadatavalue set text_value='Lance W. Robinson: 0000-0002-5224-8644' where resource_type_id=2 and metadata_field_id=240 and text_value like '%0000-0002-6344-195X%';
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><ul>
<li>Communicate with DSpace editors on Yammer about being more careful about spaces and character editing when doing manual metadata edits</li>
<li><p>Communicate with DSpace editors on Yammer about being more careful about spaces and character editing when doing manual metadata edits</p></li> <li>Merge the changes to CRP names to the <code>5_x-prod</code> branch and deploy on CGSpace (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</li>
<li>Run corrections for CRP names in the database:</li>
<li><p>Merge the changes to CRP names to the <code>5_x-prod</code> branch and deploy on CGSpace (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</p></li> </ul>
<li><p>Run corrections for CRP names in the database:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>Run all system updates on CGSpace (linode18) and reboot the server</li>
<li><p>Run all system updates on CGSpace (linode18) and reboot the server</p></li> <li>I started a full Discovery re-index on CGSpace because of the updated CRPs</li>
<li>I see this error in the DSpace log:</li>
<li><p>I started a full Discovery re-index on CGSpace because of the updated CRPs</p></li> </ul>
<li><p>I see this error in the DSpace log:</p>
<pre><code>2018-03-20 19:03:14,844 ERROR com.atmire.dspace.discovery.AtmireSolrService @ No choices plugin was configured for field &quot;dc_contributor_author&quot;. <pre><code>2018-03-20 19:03:14,844 ERROR com.atmire.dspace.discovery.AtmireSolrService @ No choices plugin was configured for field &quot;dc_contributor_author&quot;.
java.lang.IllegalArgumentException: No choices plugin was configured for field &quot;dc_contributor_author&quot;. java.lang.IllegalArgumentException: No choices plugin was configured for field &quot;dc_contributor_author&quot;.
at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:261) at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:261)
at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:249) at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:249)
at org.dspace.browse.SolrBrowseCreateDAO.additionalIndex(SolrBrowseCreateDAO.java:215) at org.dspace.browse.SolrBrowseCreateDAO.additionalIndex(SolrBrowseCreateDAO.java:215)
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:662) at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:662)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:807) at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:807)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876) at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370) at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
at org.dspace.discovery.IndexClient.main(IndexClient.java:117) at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
</code></pre></li> </code></pre><ul>
<li>I have to figure that one out&hellip;</li>
<li><p>I have to figure that one out&hellip;</p></li>
</ul> </ul>
<h2 id="20180321">2018-03-21</h2>
<h2 id="2018-03-21">2018-03-21</h2>
<ul> <ul>
<li>Looks like the indexing gets confused that there is still data in the <code>authority</code> column</li> <li>Looks like the indexing gets confused that there is still data in the <code>authority</code> column</li>
<li>Unfortunately this causes those items to simply not be indexed, which users noticed because item counts were cut in half and old items showed up in RSS!</li> <li>Unfortunately this causes those items to simply not be indexed, which users noticed because item counts were cut in half and old items showed up in RSS!</li>
<li>Since we've migrated the ORCID identifiers associated with the authority data to the <code>cg.creator.id</code> field we can nullify the authorities remaining in the database:</li>
<li><p>Since we&rsquo;ve migrated the ORCID identifiers associated with the authority data to the <code>cg.creator.id</code> field we can nullify the authorities remaining in the database:</p> </ul>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-sql" data-lang="sql">dspace<span style="color:#f92672">=</span><span style="color:#f92672">#</span> <span style="color:#66d9ef">UPDATE</span> metadatavalue <span style="color:#66d9ef">SET</span> authority<span style="color:#f92672">=</span><span style="color:#66d9ef">NULL</span> <span style="color:#66d9ef">WHERE</span> resource_type_id<span style="color:#f92672">=</span><span style="color:#ae81ff">2</span> <span style="color:#66d9ef">AND</span> metadata_field_id<span style="color:#f92672">=</span><span style="color:#ae81ff">3</span> <span style="color:#66d9ef">AND</span> authority <span style="color:#66d9ef">IS</span> <span style="color:#66d9ef">NOT</span> <span style="color:#66d9ef">NULL</span>;
<pre><code class="language-sql">dspace=# UPDATE metadatavalue SET authority=NULL WHERE resource_type_id=2 AND metadata_field_id=3 AND authority IS NOT NULL; <span style="color:#66d9ef">UPDATE</span> <span style="color:#ae81ff">195463</span>
UPDATE 195463 </code></pre></div><ul>
</code></pre></li> <li>After this the indexing works as usual and item counts and facets are back to normal</li>
<li>Send Peter a list of all authors to correct:</li>
<li><p>After this the indexing works as usual and item counts and facets are back to normal</p></li> </ul>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-sql" data-lang="sql">dspace<span style="color:#f92672">=</span><span style="color:#f92672">#</span> <span style="color:#960050;background-color:#1e0010">\</span><span style="color:#66d9ef">copy</span> (<span style="color:#66d9ef">select</span> <span style="color:#66d9ef">distinct</span> text_value, <span style="color:#66d9ef">count</span>(<span style="color:#f92672">*</span>) <span style="color:#66d9ef">as</span> <span style="color:#66d9ef">count</span> <span style="color:#66d9ef">from</span> metadatavalue <span style="color:#66d9ef">where</span> metadata_field_id <span style="color:#f92672">=</span> (<span style="color:#66d9ef">select</span> metadata_field_id <span style="color:#66d9ef">from</span> metadatafieldregistry <span style="color:#66d9ef">where</span> element <span style="color:#f92672">=</span> <span style="color:#e6db74">&#39;</span><span style="color:#e6db74">contributor</span><span style="color:#e6db74">&#39;</span> <span style="color:#66d9ef">and</span> qualifier <span style="color:#f92672">=</span> <span style="color:#e6db74">&#39;</span><span style="color:#e6db74">author</span><span style="color:#e6db74">&#39;</span>) <span style="color:#66d9ef">AND</span> resource_type_id <span style="color:#f92672">=</span> <span style="color:#ae81ff">2</span> <span style="color:#66d9ef">group</span> <span style="color:#66d9ef">by</span> text_value <span style="color:#66d9ef">order</span> <span style="color:#66d9ef">by</span> <span style="color:#66d9ef">count</span> <span style="color:#66d9ef">desc</span>) <span style="color:#66d9ef">to</span> <span style="color:#f92672">/</span>tmp<span style="color:#f92672">/</span>authors.csv <span style="color:#66d9ef">with</span> csv header;
<li><p>Send Peter a list of all authors to correct:</p> <span style="color:#66d9ef">COPY</span> <span style="color:#ae81ff">56156</span>
</code></pre></div><ul>
<pre><code class="language-sql">dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv header; <li>Afterwards we'll want to do some batch tagging of ORCID identifiers to these names</li>
COPY 56156 <li>CGSpace crashed again this afternoon, I'm not sure of the cause but there are a lot of SQL errors in the DSpace log:</li>
</code></pre></li> </ul>
<li><p>Afterwards we&rsquo;ll want to do some batch tagging of ORCID identifiers to these names</p></li>
<li><p>CGSpace crashed again this afternoon, I&rsquo;m not sure of the cause but there are a lot of SQL errors in the DSpace log:</p>
<pre><code>2018-03-21 15:11:08,166 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error - <pre><code>2018-03-21 15:11:08,166 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
java.sql.SQLException: Connection has already been closed. java.sql.SQLException: Connection has already been closed.
</code></pre></li> </code></pre><ul>
<li>I have no idea why so many connections were abandoned this afternoon:</li>
<li><p>I have no idea why so many connections were abandoned this afternoon:</p> </ul>
<pre><code># grep 'Mar 21, 2018' /var/log/tomcat7/catalina.out | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' <pre><code># grep 'Mar 21, 2018' /var/log/tomcat7/catalina.out | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
268 268
</code></pre></li> </code></pre><ul>
<li>DSpace Test crashed again due to Java heap space, this is from the DSpace log:</li>
<li><p>DSpace Test crashed again due to Java heap space, this is from the DSpace log:</p> </ul>
<pre><code>2018-03-21 15:18:48,149 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! <pre><code>2018-03-21 15:18:48,149 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>And this is from the Tomcat Catalina log:</li>
<li><p>And this is from the Tomcat Catalina log:</p> </ul>
<pre><code>Mar 21, 2018 11:20:00 AM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run <pre><code>Mar 21, 2018 11:20:00 AM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]] SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>But there are tons of heap space errors on DSpace Test actually:</li>
<li><p>But there are tons of heap space errors on DSpace Test actually:</p> </ul>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out <pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
319 319
</code></pre></li> </code></pre><ul>
<li>I guess we need to give it more RAM because it now has CGSpace's large Solr core</li>
<li><p>I guess we need to give it more RAM because it now has CGSpace&rsquo;s large Solr core</p></li> <li>I will increase the memory from 3072m to 4096m</li>
<li>Update <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a> to use <a href="https://jdbc.postgresql.org/">PostgreSQL JBDC driver</a> 42.2.2</li>
<li><p>I will increase the memory from 3072m to 4096m</p></li> <li>Deploy the new JDBC driver on DSpace Test</li>
<li>I'm also curious to see how long the <code>dspace index-discovery -b</code> takes on DSpace Test where the DSpace installation directory is on one of Linode's new block storage volumes</li>
<li><p>Update <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a> to use <a href="https://jdbc.postgresql.org/">PostgreSQL JBDC driver</a> 42.2.2</p></li> </ul>
<li><p>Deploy the new JDBC driver on DSpace Test</p></li>
<li><p>I&rsquo;m also curious to see how long the <code>dspace index-discovery -b</code> takes on DSpace Test where the DSpace installation directory is on one of Linode&rsquo;s new block storage volumes</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 208m19.155s real 208m19.155s
user 8m39.138s user 8m39.138s
sys 2m45.135s sys 2m45.135s
</code></pre></li> </code></pre><ul>
<li>So that's about three times as long as it took on CGSpace this morning</li>
<li><p>So that&rsquo;s about three times as long as it took on CGSpace this morning</p></li> <li>I should also check the raw read speed with <code>hdparm -tT /dev/sdc</code></li>
<li>Looking at Peter's author corrections there are some mistakes due to Windows 1252 encoding</li>
<li><p>I should also check the raw read speed with <code>hdparm -tT /dev/sdc</code></p></li> <li>I need to find a way to filter these easily with OpenRefine</li>
<li>For example, Peter has inadvertantly introduced Unicode character 0xfffd into several fields</li>
<li><p>Looking at Peter&rsquo;s author corrections there are some mistakes due to Windows 1252 encoding</p></li> <li>I can search for Unicode values by their hex code in OpenRefine using the following GREL expression:</li>
<li><p>I need to find a way to filter these easily with OpenRefine</p></li>
<li><p>For example, Peter has inadvertantly introduced Unicode character 0xfffd into several fields</p></li>
<li><p>I can search for Unicode values by their hex code in OpenRefine using the following GREL expression:</p>
<pre><code>isNotNull(value.match(/.*\ufffd.*/))
</code></pre></li>
<li><p>I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues</p></li>
</ul> </ul>
<pre><code>isNotNull(value.match(/.*\ufffd.*/))
<h2 id="2018-03-22">2018-03-22</h2> </code></pre><ul>
<li>I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues</li>
</ul>
<h2 id="20180322">2018-03-22</h2>
<ul> <ul>
<li>Add ORCID identifier for Silvia Alonso</li> <li>Add ORCID identifier for Silvia Alonso</li>
<li>Update my Mirage 2 setup notes for Ubuntu 18.04: <a href="https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5">https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5</a></li> <li>Update my Mirage 2 setup notes for Ubuntu 18.04: <a href="https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5">https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5</a></li>
</ul> </ul>
<h2 id="20180324">2018-03-24</h2>
<h2 id="2018-03-24">2018-03-24</h2>
<ul> <ul>
<li>More work on the Ubuntu 18.04 readiness stuff for the <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a></li> <li>More work on the Ubuntu 18.04 readiness stuff for the <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a></li>
<li>The playbook now uses the system&rsquo;s Ruby and Node.js so I don&rsquo;t have to manually install RVM and NVM after</li> <li>The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after</li>
</ul> </ul>
<h2 id="20180325">2018-03-25</h2>
<h2 id="2018-03-25">2018-03-25</h2>
<ul> <ul>
<li>Looking at Peter&rsquo;s author corrections and trying to work out a way to find errors in OpenRefine easily</li> <li>Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily</li>
<li>I can find all names that have acceptable characters using a GREL expression like:</li>
<li><p>I can find all names that have acceptable characters using a GREL expression like:</p> </ul>
<pre><code>isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/)) <pre><code>isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/))
</code></pre></li> </code></pre><ul>
<li>But it's probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):</li>
<li><p>But it&rsquo;s probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):</p> </ul>
<pre><code>or( <pre><code>or(
isNotNull(value.match(/.*[(|)].*/)), isNotNull(value.match(/.*[(|)].*/)),
isNotNull(value.match(/.*\uFFFD.*/)), isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)), isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)) isNotNull(value.match(/.*\u200A.*/))
) )
</code></pre></li> </code></pre><ul>
<li>And here's one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it's time to add delete support to my <code>fix-metadata-values.py</code> script:</li>
<li><p>And here&rsquo;s one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it&rsquo;s time to add delete support to my <code>fix-metadata-values.py</code> script:</p> </ul>
<pre><code>or( <pre><code>or(
isNotNull(value.match(/.*delete.*/i)), isNotNull(value.match(/.*delete.*/i)),
isNotNull(value.match(/.*remove.*/i)), isNotNull(value.match(/.*remove.*/i)),
isNotNull(value.match(/.*check.*/i)) isNotNull(value.match(/.*check.*/i))
) )
</code></pre></li> </code></pre><ul>
<li>
<li><p>So I guess the routine is in OpenRefine is:</p> <p>So I guess the routine is in OpenRefine is:</p>
<ul> <ul>
<li>Transform: trim leading/trailing whitespace</li> <li>Transform: trim leading/trailing whitespace</li>
<li>Transform: collapse consecutive whitespace</li> <li>Transform: collapse consecutive whitespace</li>
<li>Custom text facet for items to delete/check</li> <li>Custom text facet for items to delete/check</li>
<li>Custom text facet for illegal characters</li> <li>Custom text facet for illegal characters</li>
</ul></li> </ul>
</li>
<li><p>Test the corrections and deletions locally, then run them on CGSpace:</p> <li>
<p>Test the corrections and deletions locally, then run them on CGSpace:</p>
</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 <pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
$ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu' $ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>Afterwards I started a full Discovery reindexing on both CGSpace and DSpace Test</li>
<li><p>Afterwards I started a full Discovery reindexing on both CGSpace and DSpace Test</p></li> <li>CGSpace took 76m28.292s</li>
<li>DSpace Test took 194m56.048s</li>
<li><p>CGSpace took 76m28.292s</p></li>
<li><p>DSpace Test took 194m56.048s</p></li>
</ul> </ul>
<h2 id="20180326">2018-03-26</h2>
<h2 id="2018-03-26">2018-03-26</h2>
<ul> <ul>
<li>Atmire got back to me about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">Listings and Reports issue</a> and said it&rsquo;s caused by items that have missing <code>dc.identifier.citation</code> fields</li> <li>Atmire got back to me about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">Listings and Reports issue</a> and said it's caused by items that have missing <code>dc.identifier.citation</code> fields</li>
<li>The will send a fix</li> <li>The will send a fix</li>
</ul> </ul>
<h2 id="20180327">2018-03-27</h2>
<h2 id="2018-03-27">2018-03-27</h2>
<ul> <ul>
<li>Atmire got back with an updated quote about the DSpace 5.8 compatibility so I&rsquo;ve forwarded it to Peter</li> <li>Atmire got back with an updated quote about the DSpace 5.8 compatibility so I've forwarded it to Peter</li>
</ul> </ul>
<h2 id="20180328">2018-03-28</h2>
<h2 id="2018-03-28">2018-03-28</h2>
<ul> <ul>
<li>DSpace Test crashed due to heap space so I&rsquo;ve increased it from 4096m to 5120m</li> <li>DSpace Test crashed due to heap space so I've increased it from 4096m to 5120m</li>
<li>The error in Tomcat's <code>catalina.out</code> was:</li>
<li><p>The error in Tomcat&rsquo;s <code>catalina.out</code> was:</p> </ul>
<pre><code>Exception in thread &quot;RMI TCP Connection(idle)&quot; java.lang.OutOfMemoryError: Java heap space <pre><code>Exception in thread &quot;RMI TCP Connection(idle)&quot; java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>Add ISI Journal (cg.isijournal) as an option in Atmire's Listing and Reports layout (<a href="https://github.com/ilri/DSpace/pull/370">#370</a>) for Abenet</li>
<li><p>Add ISI Journal (cg.isijournal) as an option in Atmire&rsquo;s Listing and Reports layout (<a href="https://github.com/ilri/DSpace/pull/370">#370</a>) for Abenet</p></li> <li>I noticed a few hundred CRPs using the old capitalized formatting so I corrected them:</li>
</ul>
<li><p>I noticed a few hundred CRPs using the old capitalized formatting so I corrected them:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db cgspace -u cgspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db cgspace -u cgspace -p 'fuuu'
Fixed 29 occurences of: CLIMATE CHANGE, AGRICULTURE AND FOOD SECURITY Fixed 29 occurences of: CLIMATE CHANGE, AGRICULTURE AND FOOD SECURITY
Fixed 7 occurences of: WATER, LAND AND ECOSYSTEMS Fixed 7 occurences of: WATER, LAND AND ECOSYSTEMS
@ -682,17 +550,12 @@ Fixed 11 occurences of: POLICIES, INSTITUTIONS, AND MARKETS
Fixed 28 occurences of: GRAIN LEGUMES Fixed 28 occurences of: GRAIN LEGUMES
Fixed 3 occurences of: FORESTS, TREES AND AGROFORESTRY Fixed 3 occurences of: FORESTS, TREES AND AGROFORESTRY
Fixed 5 occurences of: GENEBANKS Fixed 5 occurences of: GENEBANKS
</code></pre></li> </code></pre><ul>
<li>That's weird because we just updated them last week&hellip;</li>
<li><p>That&rsquo;s weird because we just updated them last week&hellip;</p></li> <li>Create a pull request to enable searching by ORCID identifier (<code>cg.creator.id</code>) in Discovery and Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</li>
<li>I will test it on DSpace Test first!</li>
<li><p>Create a pull request to enable searching by ORCID identifier (<code>cg.creator.id</code>) in Discovery and Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</p></li> <li>Fix one missing XMLUI string for &ldquo;Access Status&rdquo; (cg.identifier.status)</li>
<li>Run all system updates on DSpace Test and reboot the machine</li>
<li><p>I will test it on DSpace Test first!</p></li>
<li><p>Fix one missing XMLUI string for &ldquo;Access Status&rdquo; (cg.identifier.status)</p></li>
<li><p>Run all system updates on DSpace Test and reboot the machine</p></li>
</ul> </ul>

View File

@ -8,8 +8,7 @@
<meta property="og:title" content="April, 2018" /> <meta property="og:title" content="April, 2018" />
<meta property="og:description" content="2018-04-01 <meta property="og:description" content="2018-04-01
I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when
I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when
Catalina logs at least show some memory errors yesterday: Catalina logs at least show some memory errors yesterday:
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -21,11 +20,10 @@ Catalina logs at least show some memory errors yesterday:
<meta name="twitter:title" content="April, 2018"/> <meta name="twitter:title" content="April, 2018"/>
<meta name="twitter:description" content="2018-04-01 <meta name="twitter:description" content="2018-04-01
I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when
I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when
Catalina logs at least show some memory errors yesterday: Catalina logs at least show some memory errors yesterday:
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -106,171 +104,131 @@ Catalina logs at least show some memory errors yesterday:
</p> </p>
</header> </header>
<h2 id="2018-04-01">2018-04-01</h2> <h2 id="20180401">2018-04-01</h2>
<ul> <ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li> <li>Catalina logs at least show some memory errors yesterday:</li>
</ul> </ul>
<pre><code>Mar 31, 2018 10:26:42 PM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run <pre><code>Mar 31, 2018 10:26:42 PM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]] SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space
Exception in thread &quot;ContainerBackgroundProcessor[StandardEngine[Catalina]]&quot; java.lang.OutOfMemoryError: Java heap space Exception in thread &quot;ContainerBackgroundProcessor[StandardEngine[Catalina]]&quot; java.lang.OutOfMemoryError: Java heap space
</code></pre> </code></pre><ul>
<ul>
<li>So this is getting super annoying</li> <li>So this is getting super annoying</li>
<li>I ran all system updates on DSpace Test and rebooted it</li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li>For some reason Listings and Reports is not giving any results for any queries now&hellip;</li> <li>For some reason Listings and Reports is not giving any results for any queries now&hellip;</li>
<li>I posted a message on Yammer to ask if people are using the Duplicate Check step from the Metadata Quality Module</li> <li>I posted a message on Yammer to ask if people are using the Duplicate Check step from the Metadata Quality Module</li>
<li>Help Lili Szilagyi with a question about statistics on some CCAFS items</li> <li>Help Lili Szilagyi with a question about statistics on some CCAFS items</li>
</ul> </ul>
<h2 id="20180404">2018-04-04</h2>
<h2 id="2018-04-04">2018-04-04</h2>
<ul> <ul>
<li>Peter noticed that there were still some old CRP names on CGSpace, because I hadn&rsquo;t forced the Discovery index to be updated after I fixed the others last week</li> <li>Peter noticed that there were still some old CRP names on CGSpace, because I hadn't forced the Discovery index to be updated after I fixed the others last week</li>
<li>For completeness I re-ran the CRP corrections on CGSpace:</li>
<li><p>For completeness I re-ran the CRP corrections on CGSpace:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
Fixed 1 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH Fixed 1 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH
</code></pre></li> </code></pre><ul>
<li>Then started a full Discovery index:</li>
<li><p>Then started a full Discovery index:</p> </ul>
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m' <pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m'
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 76m13.841s real 76m13.841s
user 8m22.960s user 8m22.960s
sys 2m2.498s sys 2m2.498s
</code></pre></li> </code></pre><ul>
<li>Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme's items</li>
<li><p>Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme&rsquo;s items</p></li> <li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
</ul>
<li><p>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</p>
<pre><code>$ ./add-orcid-identifiers-csv.py -i /tmp/jtohme-2018-04-04.csv -db dspace -u dspace -p 'fuuu' <pre><code>$ ./add-orcid-identifiers-csv.py -i /tmp/jtohme-2018-04-04.csv -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>The CSV format of <code>jtohme-2018-04-04.csv</code> was:</li>
<li><p>The CSV format of <code>jtohme-2018-04-04.csv</code> was:</p> </ul>
<pre><code class="language-csv" data-lang="csv">dc.contributor.author,cg.creator.id
<pre><code class="language-csv">dc.contributor.author,cg.creator.id
&quot;Tohme, Joseph M.&quot;,Joe Tohme: 0000-0003-2765-7101 &quot;Tohme, Joseph M.&quot;,Joe Tohme: 0000-0003-2765-7101
</code></pre></li> </code></pre><ul>
<li>There was a quoting error in my CRP CSV and the replacements for <code>Forests, Trees and Agroforestry</code> got messed up</li>
<li><p>There was a quoting error in my CRP CSV and the replacements for <code>Forests, Trees and Agroforestry</code> got messed up</p></li> <li>So I fixed them and had to re-index again!</li>
<li>I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:</li>
<li><p>So I fixed them and had to re-index again!</p></li> </ul>
<li><p>I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:</p>
<pre><code>$ git checkout -b 5_x-dspace-5.8 5_x-prod <pre><code>$ git checkout -b 5_x-dspace-5.8 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.8 $ git rebase -i dspace-5.8
</code></pre></li> </code></pre><ul>
<li>I was prepared to skip some commits that I had cherry picked from the upstream <code>dspace-5_x</code> branch when we did the DSpace 5.5 upgrade (see notes on 2016-10-19 and 2017-12-17):
<li><p>I was prepared to skip some commits that I had cherry picked from the upstream <code>dspace-5_x</code> branch when we did the DSpace 5.5 upgrade (see notes on 2016-10-19 and 2017-12-17):</p>
<ul> <ul>
<li>[DS-3246] Improve cleanup in recyclable components (upstream commit on dspace-5_x: 9f0f5940e7921765c6a22e85337331656b18a403)</li> <li>[DS-3246] Improve cleanup in recyclable components (upstream commit on dspace-5_x: 9f0f5940e7921765c6a22e85337331656b18a403)</li>
<li>[DS-3250] applying patch provided by Atmire (upstream commit on dspace-5_x: c6fda557f731dbc200d7d58b8b61563f86fe6d06)</li> <li>[DS-3250] applying patch provided by Atmire (upstream commit on dspace-5_x: c6fda557f731dbc200d7d58b8b61563f86fe6d06)</li>
<li>bump up to latest minor pdfbox version (upstream commit on dspace-5_x: b5330b78153b2052ed3dc2fd65917ccdbfcc0439)</li> <li>bump up to latest minor pdfbox version (upstream commit on dspace-5_x: b5330b78153b2052ed3dc2fd65917ccdbfcc0439)</li>
<li>DS-3583 Usage of correct Collection Array (#1731) (upstream commit on dspace-5_x: c8f62e6f496fa86846bfa6bcf2d16811087d9761)</li> <li>DS-3583 Usage of correct Collection Array (#1731) (upstream commit on dspace-5_x: c8f62e6f496fa86846bfa6bcf2d16811087d9761)</li>
</ul></li>
<li><p>&hellip; but somehow git knew, and didn&rsquo;t include them in my interactive rebase!</p></li>
<li><p>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</p></li>
<li><p>Fix Sisay&rsquo;s SSH access to the new DSpace Test server (linode19)</p></li>
</ul> </ul>
</li>
<h2 id="2018-04-05">2018-04-05</h2> <li>&hellip; but somehow git knew, and didn't include them in my interactive rebase!</li>
<li>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</li>
<li>Fix Sisay's SSH access to the new DSpace Test server (linode19)</li>
</ul>
<h2 id="20180405">2018-04-05</h2>
<ul> <ul>
<li>Fix Sisay&rsquo;s sudo access on the new DSpace Test server (linode19)</li> <li>Fix Sisay's sudo access on the new DSpace Test server (linode19)</li>
<li>The reindexing process on DSpace Test took <em>forever</em> yesterday:</li>
<li><p>The reindexing process on DSpace Test took <em>forever</em> yesterday:</p> </ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 599m32.961s real 599m32.961s
user 9m3.947s user 9m3.947s
sys 2m52.585s sys 2m52.585s
</code></pre></li> </code></pre><ul>
<li>So we really should not use this Linode block storage for Solr</li>
<li><p>So we really should not use this Linode block storage for Solr</p></li> <li>Assetstore might be fine but would complicate things with configuration and deployment (ughhh)</li>
<li>Better to use Linode block storage only for backup</li>
<li><p>Assetstore might be fine but would complicate things with configuration and deployment (ughhh)</p></li> <li>Help Peter with the GDPR compliance / reporting form for CGSpace</li>
<li>DSpace Test crashed due to memory issues again:</li>
<li><p>Better to use Linode block storage only for backup</p></li> </ul>
<li><p>Help Peter with the GDPR compliance / reporting form for CGSpace</p></li>
<li><p>DSpace Test crashed due to memory issues again:</p>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out <pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
16 16
</code></pre></li> </code></pre><ul>
<li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li> <li>Proof some records on DSpace Test for Udana from IWMI</li>
<li>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</li>
<li><p>Proof some records on DSpace Test for Udana from IWMI</p></li>
<li><p>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</p></li>
</ul> </ul>
<h2 id="20180410">2018-04-10</h2>
<h2 id="2018-04-10">2018-04-10</h2>
<ul> <ul>
<li>I got a notice that CGSpace CPU usage was very high this morning</li> <li>I got a notice that CGSpace CPU usage was very high this morning</li>
<li>Looking at the nginx logs, here are the top users today so far:</li>
<li><p>Looking at the nginx logs, here are the top users today so far:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
282 207.46.13.112 282 207.46.13.112
286 54.175.208.220 286 54.175.208.220
287 207.46.13.113 287 207.46.13.113
298 66.249.66.153 298 66.249.66.153
322 207.46.13.114 322 207.46.13.114
780 104.196.152.243 780 104.196.152.243
3994 178.154.200.38 3994 178.154.200.38
4295 70.32.83.92 4295 70.32.83.92
4388 95.108.181.88 4388 95.108.181.88
7653 45.5.186.2 7653 45.5.186.2
</code></pre></li> </code></pre><ul>
<li>45.5.186.2 is of course CIAT</li>
<li><p>45.5.186.2 is of course CIAT</p></li> <li>95.108.181.88 appears to be Yandex:</li>
</ul>
<li><p>95.108.181.88 appears to be Yandex:</p>
<pre><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] &quot;GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1&quot; 200 2638 &quot;-&quot; &quot;Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)&quot; <pre><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] &quot;GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1&quot; 200 2638 &quot;-&quot; &quot;Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)&quot;
</code></pre></li> </code></pre><ul>
<li>And for some reason Yandex created a lot of Tomcat sessions today:</li>
<li><p>And for some reason Yandex created a lot of Tomcat sessions today:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
4363 4363
</code></pre></li> </code></pre><ul>
<li>70.32.83.92 appears to be some harvester we've seen before, but on a new IP</li>
<li><p>70.32.83.92 appears to be some harvester we&rsquo;ve seen before, but on a new IP</p></li> <li>They are not creating new Tomcat sessions so there is no problem there</li>
<li>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</li>
<li><p>They are not creating new Tomcat sessions so there is no problem there</p></li> </ul>
<li><p>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</p>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
3982 3982
</code></pre></li> </code></pre><ul>
<li>I'm not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</li>
<li><p>I&rsquo;m not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</p></li> <li>Let's try a manual request with and without their user agent:</li>
</ul>
<li><p>Let&rsquo;s try a manual request with and without their user agent:</p>
<pre><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)' <pre><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1 GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
Accept: */* Accept: */*
@ -319,385 +277,291 @@ X-Cocoon-Version: 2.2.0
X-Content-Type-Options: nosniff X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block X-XSS-Protection: 1; mode=block
</code></pre></li> </code></pre><ul>
<li>So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve</li>
<li><p>So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve</p></li> <li>And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)</li>
<li>Indeed the number of Tomcat sessions appears to be normal:</li>
<li><p>And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)</p></li>
<li><p>Indeed the number of Tomcat sessions appears to be normal:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2018/04/jmx_dspace_sessions-week.png" alt="Tomcat sessions week"></p>
<p><img src="/cgspace-notes/2018/04/jmx_dspace_sessions-week.png" alt="Tomcat sessions week" /></p>
<ul> <ul>
<li><p>In other news, it looks like the number of total requests processed by nginx in March went down from the previous months:</p> <li>In other news, it looks like the number of total requests processed by nginx in March went down from the previous months:</li>
</ul>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Mar/2018&quot; <pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Mar/2018&quot;
2266594 2266594
real 0m13.658s real 0m13.658s
user 0m16.533s user 0m16.533s
sys 0m1.087s sys 0m1.087s
</code></pre></li> </code></pre><ul>
<li>In other other news, the database cleanup script has an issue again:</li>
<li><p>In other other news, the database cleanup script has an issue again:</p> </ul>
<pre><code>$ dspace cleanup -v <pre><code>$ dspace cleanup -v
... ...
Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(151626) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(151626) is still referenced from table &quot;bundle&quot;.
</code></pre></li> </code></pre><ul>
<li>The solution is, as always:</li>
<li><p>The solution is, as always:</p> </ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (151626);' <pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (151626);'
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><ul>
<li>Looking at abandoned connections in Tomcat:</li>
<li><p>Looking at abandoned connections in Tomcat:</p> </ul>
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' <pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
2115 2115
</code></pre></li> </code></pre><ul>
<li>Apparently from these stacktraces we should be able to see which code is not closing connections properly</li>
<li><p>Apparently from these stacktraces we should be able to see which code is not closing connections properly</p></li> <li>Here's a pretty good overview of days where we had database issues recently:</li>
<li><p>Here&rsquo;s a pretty good overview of days where we had database issues recently:</p>
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' | awk '{print $1,$2, $3}' | sort | uniq -c | sort -n
1 Feb 18, 2018
1 Feb 19, 2018
1 Feb 20, 2018
1 Feb 24, 2018
2 Feb 13, 2018
3 Feb 17, 2018
5 Feb 16, 2018
5 Feb 23, 2018
5 Feb 27, 2018
6 Feb 25, 2018
40 Feb 14, 2018
63 Feb 28, 2018
154 Mar 19, 2018
202 Feb 21, 2018
264 Feb 26, 2018
268 Mar 21, 2018
524 Feb 22, 2018
570 Feb 15, 2018
</code></pre></li>
<li><p>In Tomcat 8.5 the <code>removeAbandoned</code> property has been split into two: <code>removeAbandonedOnBorrow</code> and <code>removeAbandonedOnMaintenance</code></p></li>
<li><p>See: <a href="https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations">https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations</a></p></li>
<li><p>I assume we want <code>removeAbandonedOnBorrow</code> and make updates to the Tomcat 8 templates in Ansible</p></li>
<li><p>After reading more documentation I see that Tomcat 8.5&rsquo;s default DBCP seems to now be Commons DBCP2 instead of Tomcat DBCP</p></li>
<li><p>It can be overridden in Tomcat&rsquo;s <em>server.xml</em> by setting <code>factory=&quot;org.apache.tomcat.jdbc.pool.DataSourceFactory&quot;</code> in the <code>&lt;Resource&gt;</code></p></li>
<li><p>I think we should use this default, so we&rsquo;ll need to remove some other settings that are specific to Tomcat&rsquo;s DBCP like <code>jdbcInterceptors</code> and <code>abandonWhenPercentageFull</code></p></li>
<li><p>Merge the changes adding ORCID identifier to advanced search and Atmire Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</p></li>
<li><p>Fix one more issue of missing XMLUI strings (for CRP subject when clicking &ldquo;view more&rdquo; in the Discovery sidebar)</p></li>
<li><p>I told Udana to fix the citation and abstract of the one item, and to correct the <code>dc.language.iso</code> for the five Spanish items in his Book Chapters collection</p></li>
<li><p>Then we can import the records to CGSpace</p></li>
</ul> </ul>
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' | awk '{print $1,$2, $3}' | sort | uniq -c | sort -n
<h2 id="2018-04-11">2018-04-11</h2> 1 Feb 18, 2018
1 Feb 19, 2018
1 Feb 20, 2018
1 Feb 24, 2018
2 Feb 13, 2018
3 Feb 17, 2018
5 Feb 16, 2018
5 Feb 23, 2018
5 Feb 27, 2018
6 Feb 25, 2018
40 Feb 14, 2018
63 Feb 28, 2018
154 Mar 19, 2018
202 Feb 21, 2018
264 Feb 26, 2018
268 Mar 21, 2018
524 Feb 22, 2018
570 Feb 15, 2018
</code></pre><ul>
<li>In Tomcat 8.5 the <code>removeAbandoned</code> property has been split into two: <code>removeAbandonedOnBorrow</code> and <code>removeAbandonedOnMaintenance</code></li>
<li>See: <a href="https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations">https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations</a></li>
<li>I assume we want <code>removeAbandonedOnBorrow</code> and make updates to the Tomcat 8 templates in Ansible</li>
<li>After reading more documentation I see that Tomcat 8.5's default DBCP seems to now be Commons DBCP2 instead of Tomcat DBCP</li>
<li>It can be overridden in Tomcat's <em>server.xml</em> by setting <code>factory=&quot;org.apache.tomcat.jdbc.pool.DataSourceFactory&quot;</code> in the <code>&lt;Resource&gt;</code></li>
<li>I think we should use this default, so we'll need to remove some other settings that are specific to Tomcat's DBCP like <code>jdbcInterceptors</code> and <code>abandonWhenPercentageFull</code></li>
<li>Merge the changes adding ORCID identifier to advanced search and Atmire Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</li>
<li>Fix one more issue of missing XMLUI strings (for CRP subject when clicking &ldquo;view more&rdquo; in the Discovery sidebar)</li>
<li>I told Udana to fix the citation and abstract of the one item, and to correct the <code>dc.language.iso</code> for the five Spanish items in his Book Chapters collection</li>
<li>Then we can import the records to CGSpace</li>
</ul>
<h2 id="20180411">2018-04-11</h2>
<ul> <ul>
<li><p>DSpace Test (linode19) crashed again some time since yesterday:</p> <li>DSpace Test (linode19) crashed again some time since yesterday:</li>
</ul>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out <pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
168 168
</code></pre></li> </code></pre><ul>
<li>I ran all system updates and rebooted the server</li>
<li><p>I ran all system updates and rebooted the server</p></li>
</ul> </ul>
<h2 id="20180412">2018-04-12</h2>
<h2 id="2018-04-12">2018-04-12</h2>
<ul> <ul>
<li>I caught wind of an interesting XMLUI performance optimization coming in DSpace 6.3: <a href="https://jira.duraspace.org/browse/DS-3883">https://jira.duraspace.org/browse/DS-3883</a></li> <li>I caught wind of an interesting XMLUI performance optimization coming in DSpace 6.3: <a href="https://jira.duraspace.org/browse/DS-3883">https://jira.duraspace.org/browse/DS-3883</a></li>
<li>I asked for it to be ported to DSpace 5.x</li> <li>I asked for it to be ported to DSpace 5.x</li>
</ul> </ul>
<h2 id="20180413">2018-04-13</h2>
<h2 id="2018-04-13">2018-04-13</h2>
<ul> <ul>
<li>Add <code>PII-LAM_CSAGender</code> to CCAFS Phase II project tags in <code>input-forms.xml</code></li> <li>Add <code>PII-LAM_CSAGender</code> to CCAFS Phase II project tags in <code>input-forms.xml</code></li>
</ul> </ul>
<h2 id="20180415">2018-04-15</h2>
<h2 id="2018-04-15">2018-04-15</h2>
<ul> <ul>
<li><p>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</p> <li>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</li>
</ul>
<pre><code>2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check &quot;solr.authority.server&quot; property in the dspace.cfg <pre><code>2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check &quot;solr.authority.server&quot; property in the dspace.cfg
java.lang.NullPointerException java.lang.NullPointerException
</code></pre></li> </code></pre><ul>
<li>I assume we need to remove <code>authority</code> from the consumers in <code>dspace/config/dspace.cfg</code>:</li>
<li><p>I assume we need to remove <code>authority</code> from the consumers in <code>dspace/config/dspace.cfg</code>:</p> </ul>
<pre><code>event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester, statistics,batchedit, versioningmqm <pre><code>event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester, statistics,batchedit, versioningmqm
</code></pre></li> </code></pre><ul>
<li>I see the same error on DSpace Test so this is definitely a problem</li>
<li><p>I see the same error on DSpace Test so this is definitely a problem</p></li> <li>After disabling the authority consumer I no longer see the error</li>
<li>I merged a pull request to the <code>5_x-prod</code> branch to clean that up (<a href="https://github.com/ilri/DSpace/pull/372">#372</a>)</li>
<li><p>After disabling the authority consumer I no longer see the error</p></li> <li>File a ticket on DSpace's Jira for the <code>target=&quot;_blank&quot;</code> security and performance issue (<a href="https://jira.duraspace.org/browse/DS-3891">DS-3891</a>)</li>
<li>I re-deployed DSpace Test (linode19) and was surprised by how long it took the ant update to complete:</li>
<li><p>I merged a pull request to the <code>5_x-prod</code> branch to clean that up (<a href="https://github.com/ilri/DSpace/pull/372">#372</a>)</p></li> </ul>
<li><p>File a ticket on DSpace&rsquo;s Jira for the <code>target=&quot;_blank&quot;</code> security and performance issue (<a href="https://jira.duraspace.org/browse/DS-3891">DS-3891</a>)</p></li>
<li><p>I re-deployed DSpace Test (linode19) and was surprised by how long it took the ant update to complete:</p>
<pre><code>BUILD SUCCESSFUL <pre><code>BUILD SUCCESSFUL
Total time: 4 minutes 12 seconds Total time: 4 minutes 12 seconds
</code></pre></li> </code></pre><ul>
<li>The Linode block storage is much slower than the instance storage</li>
<li><p>The Linode block storage is much slower than the instance storage</p></li> <li>I ran all system updates and rebooted DSpace Test (linode19)</li>
<li><p>I ran all system updates and rebooted DSpace Test (linode19)</p></li>
</ul> </ul>
<h2 id="20180416">2018-04-16</h2>
<h2 id="2018-04-16">2018-04-16</h2>
<ul> <ul>
<li>Communicate with Bioversity about their project to migrate their e-Library (Typo3) and Sci-lit databases to CGSpace</li> <li>Communicate with Bioversity about their project to migrate their e-Library (Typo3) and Sci-lit databases to CGSpace</li>
</ul> </ul>
<h2 id="20180418">2018-04-18</h2>
<h2 id="2018-04-18">2018-04-18</h2>
<ul> <ul>
<li>IWMI people are asking about building a search query that outputs RSS for their reports</li> <li>IWMI people are asking about building a search query that outputs RSS for their reports</li>
<li>They want the same results as this Discovery query: <a href="https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018&amp;submit_apply_filter=&amp;query=&amp;scope=10568%2F16814&amp;rpp=100&amp;sort_by=dc.date.issued_dt&amp;order=desc">https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018&amp;submit_apply_filter=&amp;query=&amp;scope=10568%2F16814&amp;rpp=100&amp;sort_by=dc.date.issued_dt&amp;order=desc</a></li> <li>They want the same results as this Discovery query: <a href="https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018&amp;submit_apply_filter=&amp;query=&amp;scope=10568%2F16814&amp;rpp=100&amp;sort_by=dc.date.issued_dt&amp;order=desc">https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018&amp;submit_apply_filter=&amp;query=&amp;scope=10568%2F16814&amp;rpp=100&amp;sort_by=dc.date.issued_dt&amp;order=desc</a></li>
<li>They will need to use OpenSearch, but I can&rsquo;t remember all the parameters</li> <li>They will need to use OpenSearch, but I can't remember all the parameters</li>
<li>Apparently search sort options for OpenSearch are in <code>dspace.cfg</code>:</li>
<li><p>Apparently search sort options for OpenSearch are in <code>dspace.cfg</code>:</p> </ul>
<pre><code>webui.itemlist.sort-option.1 = title:dc.title:title <pre><code>webui.itemlist.sort-option.1 = title:dc.title:title
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
webui.itemlist.sort-option.4 = type:dc.type:text webui.itemlist.sort-option.4 = type:dc.type:text
</code></pre></li> </code></pre><ul>
<li>They want items by issue date, so we need to use sort option 2</li>
<li><p>They want items by issue date, so we need to use sort option 2</p></li> <li>According to the DSpace Manual there are only the following parameters to OpenSearch: format, scope, rpp, start, and sort_by</li>
<li>The OpenSearch <code>query</code> parameter expects a Discovery search filter that is defined in <code>dspace/config/spring/api/discovery.xml</code></li>
<li><p>According to the DSpace Manual there are only the following parameters to OpenSearch: format, scope, rpp, start, and sort_by</p></li> <li>So for IWMI they should be able to use something like this: <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&amp;scope=10568/16814&amp;sort_by=2&amp;order=DESC&amp;format=rss">https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&amp;scope=10568/16814&amp;sort_by=2&amp;order=DESC&amp;format=rss</a></li>
<li>There are also <code>rpp</code> (results per page) and <code>start</code> parameters but in my testing now on DSpace 5.5 they behave very strangely</li>
<li><p>The OpenSearch <code>query</code> parameter expects a Discovery search filter that is defined in <code>dspace/config/spring/api/discovery.xml</code></p></li> <li>For example, set <code>rpp=1</code> and then check the results for <code>start</code> values of 0, 1, and 2 and they are all the same!</li>
<li>If I have time I will check if this behavior persists on DSpace 6.x on the official DSpace demo and file a bug</li>
<li><p>So for IWMI they should be able to use something like this: <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&amp;scope=10568/16814&amp;sort_by=2&amp;order=DESC&amp;format=rss">https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&amp;scope=10568/16814&amp;sort_by=2&amp;order=DESC&amp;format=rss</a></p></li> <li>Also, the DSpace Manual as of 5.x has very poor documentation for OpenSearch</li>
<li>They don't tell you to use Discovery search filters in the <code>query</code> (with format <code>query=dateIssued:2018</code>)</li>
<li><p>There are also <code>rpp</code> (results per page) and <code>start</code> parameters but in my testing now on DSpace 5.5 they behave very strangely</p></li> <li>They don't tell you that the sort options are actually defined in <code>dspace.cfg</code> (ie, you need to use <code>2</code> instead of <code>dc.date.issued_dt</code>)</li>
<li>They are missing the <code>order</code> parameter (ASC vs DESC)</li>
<li><p>For example, set <code>rpp=1</code> and then check the results for <code>start</code> values of 0, 1, and 2 and they are all the same!</p></li> <li>I notice that DSpace Test has crashed again, due to memory:</li>
</ul>
<li><p>If I have time I will check if this behavior persists on DSpace 6.x on the official DSpace demo and file a bug</p></li>
<li><p>Also, the DSpace Manual as of 5.x has very poor documentation for OpenSearch</p></li>
<li><p>They don&rsquo;t tell you to use Discovery search filters in the <code>query</code> (with format <code>query=dateIssued:2018</code>)</p></li>
<li><p>They don&rsquo;t tell you that the sort options are actually defined in <code>dspace.cfg</code> (ie, you need to use <code>2</code> instead of <code>dc.date.issued_dt</code>)</p></li>
<li><p>They are missing the <code>order</code> parameter (ASC vs DESC)</p></li>
<li><p>I notice that DSpace Test has crashed again, due to memory:</p>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out <pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
178 178
</code></pre></li> </code></pre><ul>
<li>I will increase the JVM heap size from 5120M to 6144M, though we don't have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace</li>
<li><p>I will increase the JVM heap size from 5120M to 6144M, though we don&rsquo;t have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace</p></li> <li>Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats</li>
<li>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</li>
<li><p>Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats</p></li>
<li><p>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</p>
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
</code></pre></li>
</ul> </ul>
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
<h2 id="2018-04-19">2018-04-19</h2> </code></pre><h2 id="20180419">2018-04-19</h2>
<ul> <ul>
<li>Run updates on DSpace Test (linode19) and reboot the server</li> <li>Run updates on DSpace Test (linode19) and reboot the server</li>
<li>Also try deploying updated GeoLite database during ant update while re-deploying code:</li>
<li><p>Also try deploying updated GeoLite database during ant update while re-deploying code:</p> </ul>
<pre><code>$ ant update update_geolite clean_backups <pre><code>$ ant update update_geolite clean_backups
</code></pre></li> </code></pre><ul>
<li>I also re-deployed CGSpace (linode18) to make the ORCID search, authority cleanup, CCAFS project tag <code>PII-LAM_CSAGender</code> live</li>
<li><p>I also re-deployed CGSpace (linode18) to make the ORCID search, authority cleanup, CCAFS project tag <code>PII-LAM_CSAGender</code> live</p></li> <li>When re-deploying I also updated the GeoLite databases so I hope the country stats become more accurate&hellip;</li>
<li>After re-deployment I ran all system updates on the server and rebooted it</li>
<li><p>When re-deploying I also updated the GeoLite databases so I hope the country stats become more accurate&hellip;</p></li> <li>After the reboot I forced a reïndexing of the Discovery to populate the new ORCID index:</li>
</ul>
<li><p>After re-deployment I ran all system updates on the server and rebooted it</p></li>
<li><p>After the reboot I forced a reïndexing of the Discovery to populate the new ORCID index:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 73m42.635s real 73m42.635s
user 8m15.885s user 8m15.885s
sys 2m2.687s sys 2m2.687s
</code></pre></li> </code></pre><ul>
<li>This time is with about 70,000 items in the repository</li>
<li><p>This time is with about 70,000 items in the repository</p></li>
</ul> </ul>
<h2 id="20180420">2018-04-20</h2>
<h2 id="2018-04-20">2018-04-20</h2>
<ul> <ul>
<li>Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven&rsquo;t seen any emails from UptimeRobot</li> <li>Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven't seen any emails from UptimeRobot</li>
<li>I confirm that it&rsquo;s just giving a white page around 4:16</li> <li>I confirm that it's just giving a white page around 4:16</li>
<li>The DSpace logs show that there are no database connections:</li>
<li><p>The DSpace logs show that there are no database connections:</p> </ul>
<pre><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000]. <pre><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
</code></pre></li> </code></pre><ul>
<li>And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):</li>
<li><p>And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):</p> </ul>
<pre><code># grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20 <pre><code># grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20
32147 32147
</code></pre></li> </code></pre><ul>
<li>I can't even log into PostgreSQL as the <code>postgres</code> user, WTF?</li>
<li><p>I can&rsquo;t even log into PostgreSQL as the <code>postgres</code> user, WTF?</p> </ul>
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c <pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
^C ^C
</code></pre></li> </code></pre><ul>
<li>Here are the most active IPs today:</li>
<li><p>Here are the most active IPs today:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
917 207.46.13.182 917 207.46.13.182
935 213.55.99.121 935 213.55.99.121
970 40.77.167.134 970 40.77.167.134
978 207.46.13.80 978 207.46.13.80
1422 66.249.64.155 1422 66.249.64.155
1577 50.116.102.77 1577 50.116.102.77
2456 95.108.181.88 2456 95.108.181.88
3216 104.196.152.243 3216 104.196.152.243
4325 70.32.83.92 4325 70.32.83.92
10718 45.5.184.2 10718 45.5.184.2
</code></pre></li> </code></pre><ul>
<li>It doesn't even seem like there is a lot of traffic compared to the previous days:</li>
<li><p>It doesn&rsquo;t even seem like there is a lot of traffic compared to the previous days:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | wc -l <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | wc -l
74931 74931
# zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz| grep -E &quot;19/Apr/2018&quot; | wc -l # zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz| grep -E &quot;19/Apr/2018&quot; | wc -l
91073 91073
# zcat --force /var/log/nginx/*.log.2.gz /var/log/nginx/*.log.3.gz| grep -E &quot;18/Apr/2018&quot; | wc -l # zcat --force /var/log/nginx/*.log.2.gz /var/log/nginx/*.log.3.gz| grep -E &quot;18/Apr/2018&quot; | wc -l
93459 93459
</code></pre></li> </code></pre><ul>
<li>I tried to restart Tomcat but <code>systemctl</code> hangs</li>
<li><p>I tried to restart Tomcat but <code>systemctl</code> hangs</p></li> <li>I tried to reboot the server from the command line but after a few minutes it didn't come back up</li>
<li>Looking at the Linode console I see that it is stuck trying to shut down</li>
<li><p>I tried to reboot the server from the command line but after a few minutes it didn&rsquo;t come back up</p></li> <li>Even &ldquo;Reboot&rdquo; via Linode console doesn't work!</li>
<li>After shutting it down a few times via the Linode console it finally rebooted</li>
<li><p>Looking at the Linode console I see that it is stuck trying to shut down</p></li> <li>Everything is back but I have no idea what caused this—I suspect something with the hosting provider</li>
<li>Also super weird, the last entry in the DSpace log file is from <code>2018-04-20 16:35:09</code>, and then immediately it goes to <code>2018-04-20 19:15:04</code> (three hours later!):</li>
<li><p>Even &ldquo;Reboot&rdquo; via Linode console doesn&rsquo;t work!</p></li> </ul>
<li><p>After shutting it down a few times via the Linode console it finally rebooted</p></li>
<li><p>Everything is back but I have no idea what caused this—I suspect something with the hosting provider</p></li>
<li><p>Also super weird, the last entry in the DSpace log file is from <code>2018-04-20 16:35:09</code>, and then immediately it goes to <code>2018-04-20 19:15:04</code> (three hours later!):</p>
<pre><code>2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table. <pre><code>2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table.
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle
:0; lastwait:5000]. :0; lastwait:5000].
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685) at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:187) at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:187)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128) at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:632) at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:632)
at org.dspace.core.Context.init(Context.java:121) at org.dspace.core.Context.init(Context.java:121)
at org.dspace.core.Context.&lt;init&gt;(Context.java:95) at org.dspace.core.Context.&lt;init&gt;(Context.java:95)
at org.dspace.app.util.AbstractDSpaceWebapp.deregister(AbstractDSpaceWebapp.java:97) at org.dspace.app.util.AbstractDSpaceWebapp.deregister(AbstractDSpaceWebapp.java:97)
at org.dspace.app.util.DSpaceContextListener.contextDestroyed(DSpaceContextListener.java:146) at org.dspace.app.util.DSpaceContextListener.contextDestroyed(DSpaceContextListener.java:146)
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5115) at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5115)
at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5779) at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5779)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:224) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:224)
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1588) at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1588)
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1577) at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1577)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) at java.lang.Thread.run(Thread.java:748)
2018-04-20 19:15:04,006 INFO org.dspace.core.ConfigurationManager @ Loading from classloader: file:/home/cgspace.cgiar.org/config/dspace.cfg 2018-04-20 19:15:04,006 INFO org.dspace.core.ConfigurationManager @ Loading from classloader: file:/home/cgspace.cgiar.org/config/dspace.cfg
</code></pre></li> </code></pre><ul>
<li>Very suspect!</li>
<li><p>Very suspect!</p></li>
</ul> </ul>
<h2 id="20180424">2018-04-24</h2>
<h2 id="2018-04-24">2018-04-24</h2>
<ul> <ul>
<li>Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn&rsquo;t run into a few weeks ago</li> <li>Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn't run into a few weeks ago</li>
<li>There seems to be a new issue with Java dependencies, though</li> <li>There seems to be a new issue with Java dependencies, though</li>
<li>The <code>default-jre</code> package is going to be Java 10 on Ubuntu 18.04, but I want to use <code>openjdk-8-jre-headless</code> (well, the JDK actually, but it uses this JRE)</li> <li>The <code>default-jre</code> package is going to be Java 10 on Ubuntu 18.04, but I want to use <code>openjdk-8-jre-headless</code> (well, the JDK actually, but it uses this JRE)</li>
<li>Tomcat and Ant are fine with Java 8, but the <code>maven</code> package wants to pull in Java 10 for some reason</li> <li>Tomcat and Ant are fine with Java 8, but the <code>maven</code> package wants to pull in Java 10 for some reason</li>
<li>Looking closer, I see that <code>maven</code> depends on <code>java7-runtime-headless</code>, which is indeed provided by <code>openjdk-8-jre-headless</code></li> <li>Looking closer, I see that <code>maven</code> depends on <code>java7-runtime-headless</code>, which is indeed provided by <code>openjdk-8-jre-headless</code></li>
<li>So it must be one of Maven&rsquo;s dependencies&hellip;</li> <li>So it must be one of Maven's dependencies&hellip;</li>
<li>I will watch it for a few days because it could be an issue that will be resolved before Ubuntu 18.04&rsquo;s release</li> <li>I will watch it for a few days because it could be an issue that will be resolved before Ubuntu 18.04's release</li>
<li>Otherwise I will post a bug to the ubuntu-release mailing list</li> <li>Otherwise I will post a bug to the ubuntu-release mailing list</li>
<li>Looks like the only way to fix this is to install <code>openjdk-8-jdk-headless</code> before (so it pulls in the JRE) in a separate transaction, or to manually install <code>openjdk-8-jre-headless</code> in the same apt transaction as <code>maven</code></li> <li>Looks like the only way to fix this is to install <code>openjdk-8-jdk-headless</code> before (so it pulls in the JRE) in a separate transaction, or to manually install <code>openjdk-8-jre-headless</code> in the same apt transaction as <code>maven</code></li>
<li>Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts</li> <li>Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts</li>
<li>This should be a drop in I believe, though I will definitely test it more locally as well as on DSpace Test once we move to DSpace 5.8 and Ubuntu 18.04 in the coming months</li> <li>This should be a drop in I believe, though I will definitely test it more locally as well as on DSpace Test once we move to DSpace 5.8 and Ubuntu 18.04 in the coming months</li>
</ul> </ul>
<h2 id="20180425">2018-04-25</h2>
<h2 id="2018-04-25">2018-04-25</h2>
<ul> <ul>
<li>Still testing the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6</li> <li>Still testing the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6</li>
<li>One other new thing I notice is that PostgreSQL 9.6 no longer uses <code>createuser</code> and <code>nocreateuser</code>, as those have actually meant <code>superuser</code> and <code>nosuperuser</code> and have been deprecated for <em>ten years</em></li> <li>One other new thing I notice is that PostgreSQL 9.6 no longer uses <code>createuser</code> and <code>nocreateuser</code>, as those have actually meant <code>superuser</code> and <code>nosuperuser</code> and have been deprecated for <em>ten years</em></li>
<li>So for my notes, when I'm importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:</li>
<li><p>So for my notes, when I&rsquo;m importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:</p> </ul>
<pre><code>$ psql dspacetest -c 'alter user dspacetest superuser;' <pre><code>$ psql dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-18.backup $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-18.backup
</code></pre></li> </code></pre><ul>
<li>There's another issue with Tomcat in Ubuntu 18.04:</li>
<li><p>There&rsquo;s another issue with Tomcat in Ubuntu 18.04:</p>
<pre><code>25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
</code></pre></li>
<li><p>There&rsquo;s a <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866">Debian bug about this from a few weeks ago</a></p></li>
<li><p>Apparently Tomcat was compiled with Java 9, so doesn&rsquo;t work with Java 8</p></li>
</ul> </ul>
<pre><code>25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
<h2 id="2018-04-29">2018-04-29</h2> java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
</code></pre><ul>
<li>There's a <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866">Debian bug about this from a few weeks ago</a></li>
<li>Apparently Tomcat was compiled with Java 9, so doesn't work with Java 8</li>
</ul>
<h2 id="20180429">2018-04-29</h2>
<ul> <ul>
<li>DSpace Test crashed again, looks like memory issues again</li> <li>DSpace Test crashed again, looks like memory issues again</li>
<li>JVM heap size was last increased to 6144m but the system only has 8GB total so there&rsquo;s not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data</li> <li>JVM heap size was last increased to 6144m but the system only has 8GB total so there's not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data</li>
</ul> </ul>
<h2 id="20180430">2018-04-30</h2>
<h2 id="2018-04-30">2018-04-30</h2>
<ul> <ul>
<li>DSpace Test crashed again</li> <li>DSpace Test crashed again</li>
<li>I will email the CGSpace team to ask them whether or not we want to commit to having a public test server that accurately mirrors CGSpace (ie, to upgrade to the next largest Linode)</li> <li>I will email the CGSpace team to ask them whether or not we want to commit to having a public test server that accurately mirrors CGSpace (ie, to upgrade to the next largest Linode)</li>

View File

@ -8,13 +8,12 @@
<meta property="og:title" content="May, 2018" /> <meta property="og:title" content="May, 2018" />
<meta property="og:description" content="2018-05-01 <meta property="og:description" content="2018-05-01
I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E
http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
Then I reduced the JVM heap size from 6144 back to 5120m Then I reduced the JVM heap size from 6144 back to 5120m
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
" /> " />
@ -27,17 +26,16 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
<meta name="twitter:title" content="May, 2018"/> <meta name="twitter:title" content="May, 2018"/>
<meta name="twitter:description" content="2018-05-01 <meta name="twitter:description" content="2018-05-01
I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E
http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
Then I reduced the JVM heap size from 6144 back to 5120m Then I reduced the JVM heap size from 6144 back to 5120m
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,151 +116,129 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
</p> </p>
</header> </header>
<h2 id="2018-05-01">2018-05-01</h2> <h2 id="20180501">2018-05-01</h2>
<ul> <ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: <li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul> <ul>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul></li> </ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul> </ul>
<h2 id="20180502">2018-05-02</h2>
<h2 id="2018-05-02">2018-05-02</h2>
<ul> <ul>
<li>Advise Fabio Fidanza about integrating CGSpace content in the new CGIAR corporate website</li> <li>Advise Fabio Fidanza about integrating CGSpace content in the new CGIAR corporate website</li>
<li>I think they can mostly rely on using the <code>cg.contributor.crp</code> field</li> <li>I think they can mostly rely on using the <code>cg.contributor.crp</code> field</li>
<li>Looking over some IITA records for Sisay <li>Looking over some IITA records for Sisay
<ul> <ul>
<li>Other than trimming and collapsing consecutive whitespace, I made some other corrections</li> <li>Other than trimming and collapsing consecutive whitespace, I made some other corrections</li>
<li>I need to check the correct formatting of COTE D&rsquo;IVOIRE vs COTE DIVOIRE</li> <li>I need to check the correct formatting of COTE D'IVOIRE vs COTE DIVOIRE</li>
<li>I replaced all DOIs with HTTPS</li> <li>I replaced all DOIs with HTTPS</li>
<li>I checked a few DOIs and found at least one that was missing, so I Googled the title of the paper and found the correct DOI</li> <li>I checked a few DOIs and found at least one that was missing, so I Googled the title of the paper and found the correct DOI</li>
<li>Also, I found an <a href="https://www.doi.org/factsheets/DOI_PURL.html">FAQ for DOI that says the <code>dx.doi.org</code> syntax is older</a>, so I will replace all the DOIs with <code>doi.org</code> instead</li> <li>Also, I found an <a href="https://www.doi.org/factsheets/DOI_PURL.html">FAQ for DOI that says the <code>dx.doi.org</code> syntax is older</a>, so I will replace all the DOIs with <code>doi.org</code> instead</li>
<li>I found five records with &ldquo;ISI Jounal&rdquo; instead of &ldquo;ISI Journal&rdquo;</li> <li>I found five records with &ldquo;ISI Jounal&rdquo; instead of &ldquo;ISI Journal&rdquo;</li>
<li>I found one item with IITA subject &ldquo;.&rdquo;</li> <li>I found one item with IITA subject &ldquo;.&rdquo;</li>
<li>Need to remember to check the facets for things like this in sponsorship:</li> <li>Need to remember to check the facets for things like this in sponsorship:
<ul>
<li>Deutsche Gesellschaft für Internationale Zusammenarbeit</li> <li>Deutsche Gesellschaft für Internationale Zusammenarbeit</li>
<li>Deutsche Gesellschaft fur Internationale Zusammenarbeit</li> <li>Deutsche Gesellschaft fur Internationale Zusammenarbeit</li>
</ul>
</li>
<li>Eight records with language &ldquo;fn&rdquo; instead of &ldquo;fr&rdquo;</li> <li>Eight records with language &ldquo;fn&rdquo; instead of &ldquo;fr&rdquo;</li>
<li>One incorrect type (lowercase &ldquo;proceedings&rdquo;): Conference proceedings</li> <li>One incorrect type (lowercase &ldquo;proceedings&rdquo;): Conference proceedings</li>
<li>Found some capitalized CRPs in <code>cg.contributor.crp</code></li> <li>Found some capitalized CRPs in <code>cg.contributor.crp</code></li>
<li>Found some incorrect author affiliations, ie &ldquo;Institut de Recherche pour le Developpement Agricolc&rdquo; should be &ldquo;Institut de Recherche pour le Developpement <em>Agricole</em>&ldquo;</li> <li>Found some incorrect author affiliations, ie &ldquo;Institut de Recherche pour le Developpement Agricolc&rdquo; should be &ldquo;Institut de Recherche pour le Developpement <em>Agricole</em>&rdquo;</li>
<li>Wow, and for sponsors there are the following:</li> <li>Wow, and for sponsors there are the following:
<ul>
<li>Incorrect: Flemish Agency for Development Cooperation and Technical Assistance</li> <li>Incorrect: Flemish Agency for Development Cooperation and Technical Assistance</li>
<li>Incorrect: Flemish Organization for Development Cooperation and Technical Assistance</li> <li>Incorrect: Flemish Organization for Development Cooperation and Technical Assistance</li>
<li>Correct: Flemish <em>Association</em> for Development Cooperation and Technical Assistance</li> <li>Correct: Flemish <em>Association</em> for Development Cooperation and Technical Assistance</li>
<li>One item had region &ldquo;WEST&rdquo; (I corrected it to &ldquo;WEST AFRICA&rdquo;)</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-05-03">2018-05-03</h2> <li>One item had region &ldquo;WEST&rdquo; (I corrected it to &ldquo;WEST AFRICA&rdquo;)</li>
</ul>
</li>
</ul>
<h2 id="20180503">2018-05-03</h2>
<ul> <ul>
<li>It turns out that the IITA records that I was helping Sisay with in March were imported in 2018-04 without a final check by Abenet or I</li> <li>It turns out that the IITA records that I was helping Sisay with in March were imported in 2018-04 without a final check by Abenet or I</li>
<li>There are lots of errors on language, CRP, and even some encoding errors on abstract fields</li> <li>There are lots of errors on language, CRP, and even some encoding errors on abstract fields</li>
<li>I export them and include the hidden metadata fields like <code>dc.date.accessioned</code> so I can filter the ones from 2018-04 and correct them in Open Refine:</li>
<li><p>I export them and include the hidden metadata fields like <code>dc.date.accessioned</code> so I can filter the ones from 2018-04 and correct them in Open Refine:</p>
<pre><code>$ dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
</code></pre></li>
<li><p>Abenet sent a list of 46 ORCID identifiers for ILRI authors so I need to get their names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script and merge them into our controlled vocabulary</p></li>
<li><p>On the messed up IITA records from 2018-04 I see sixty DOIs in incorrect format (cg.identifier.doi)</p></li>
</ul> </ul>
<pre><code>$ dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
<h2 id="2018-05-06">2018-05-06</h2> </code></pre><ul>
<li>Abenet sent a list of 46 ORCID identifiers for ILRI authors so I need to get their names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script and merge them into our controlled vocabulary</li>
<li>On the messed up IITA records from 2018-04 I see sixty DOIs in incorrect format (cg.identifier.doi)</li>
</ul>
<h2 id="20180506">2018-05-06</h2>
<ul> <ul>
<li>Fixing the IITA records from Sisay, sixty DOIs have completely invalid format like <code>http:dx.doi.org10.1016j.cropro.2008.07.003</code></li> <li>Fixing the IITA records from Sisay, sixty DOIs have completely invalid format like <code>http:dx.doi.org10.1016j.cropro.2008.07.003</code></li>
<li>I corrected all the DOIs and then checked them for validity with a quick bash loop:</li>
<li><p>I corrected all the DOIs and then checked them for validity with a quick bash loop:</p> </ul>
<pre><code>$ for line in $(&lt; /tmp/links.txt); do echo $line; http --print h $line; done <pre><code>$ for line in $(&lt; /tmp/links.txt); do echo $line; http --print h $line; done
</code></pre></li> </code></pre><ul>
<li>Most of the links are good, though one is duplicate and one seems to even be incorrect in the publisher's site so&hellip;</li>
<li><p>Most of the links are good, though one is duplicate and one seems to even be incorrect in the publisher&rsquo;s site so&hellip;</p></li> <li>Also, there are some duplicates:
<li><p>Also, there are some duplicates:</p>
<ul> <ul>
<li><code>10568/92241</code> and <code>10568/92230</code> (same DOI)</li> <li><code>10568/92241</code> and <code>10568/92230</code> (same DOI)</li>
<li><code>10568/92151</code> and <code>10568/92150</code> (same ISBN)</li> <li><code>10568/92151</code> and <code>10568/92150</code> (same ISBN)</li>
<li><code>10568/92291</code> and <code>10568/92286</code> (same citation, title, authors, year)</li> <li><code>10568/92291</code> and <code>10568/92286</code> (same citation, title, authors, year)</li>
</ul></li> </ul>
</li>
<li><p>Messed up abstracts:</p> <li>Messed up abstracts:
<ul> <ul>
<li><code>10568/92309</code></li> <li><code>10568/92309</code></li>
</ul></li> </ul>
</li>
<li><p>Fixed some issues in regions, countries, sponsors, ISSN, and cleaned whitespace errors from citation, abstract, author, and titles</p></li> <li>Fixed some issues in regions, countries, sponsors, ISSN, and cleaned whitespace errors from citation, abstract, author, and titles</li>
<li>Fixed all issues with CRPs</li>
<li><p>Fixed all issues with CRPs</p></li> <li>A few more interesting Unicode characters to look for in text fields like author, abstracts, and citations might be: <code></code> (0x2019), <code>·</code> (0x00b7), and <code></code> (0x20ac)</li>
<li>A custom text facit in OpenRefine with this GREL expression could be a good for finding invalid characters or encoding errors in authors, abstracts, etc:</li>
<li><p>A few more interesting Unicode characters to look for in text fields like author, abstracts, and citations might be: <code></code> (0x2019), <code>·</code> (0x00b7), and <code></code> (0x20ac)</p></li> </ul>
<li><p>A custom text facit in OpenRefine with this GREL expression could be a good for finding invalid characters or encoding errors in authors, abstracts, etc:</p>
<pre><code>or( <pre><code>or(
isNotNull(value.match(/.*[(|)].*/)), isNotNull(value.match(/.*[(|)].*/)),
isNotNull(value.match(/.*\uFFFD.*/)), isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)), isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)), isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)), isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b7.*/)), isNotNull(value.match(/.*\u00b7.*/)),
isNotNull(value.match(/.*\u20ac.*/)) isNotNull(value.match(/.*\u20ac.*/))
) )
</code></pre></li> </code></pre><ul>
<li>I found some more IITA records that Sisay imported on 2018-03-23 that have invalid CRP names, so now I kinda want to check those ones!</li>
<li><p>I found some more IITA records that Sisay imported on 2018-03-23 that have invalid CRP names, so now I kinda want to check those ones!</p></li> <li>Combine the ORCID identifiers Abenet sent with our existing list and resolve their names using the <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
</ul>
<li><p>Combine the ORCID identifiers Abenet sent with our existing list and resolve their names using the <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/ilri-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2018-05-06-combined.txt <pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/ilri-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2018-05-06-combined.txt
$ ./resolve-orcids.py -i /tmp/2018-05-06-combined.txt -o /tmp/2018-05-06-combined-names.txt -d $ ./resolve-orcids.py -i /tmp/2018-05-06-combined.txt -o /tmp/2018-05-06-combined-names.txt -d
# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents) # sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
</code></pre></li> </code></pre><ul>
<li>I made a pull request (<a href="https://github.com/ilri/DSpace/pull/373">#373</a>) for this that I'll merge some time next week (I'm expecting Atmire to get back to us about DSpace 5.8 soon)</li>
<li><p>I made a pull request (<a href="https://github.com/ilri/DSpace/pull/373">#373</a>) for this that I&rsquo;ll merge some time next week (I&rsquo;m expecting Atmire to get back to us about DSpace 5.8 soon)</p></li> <li>After testing quickly I just decided to merge it, and I noticed that I don't even need to restart Tomcat for the changes to get loaded</li>
<li><p>After testing quickly I just decided to merge it, and I noticed that I don&rsquo;t even need to restart Tomcat for the changes to get loaded</p></li>
</ul> </ul>
<h2 id="20180507">2018-05-07</h2>
<h2 id="2018-05-07">2018-05-07</h2>
<ul> <ul>
<li>I spent a bit of time playing with <a href="https://github.com/codeforkjeff/conciliator">conciliator</a> and Solr, trying to figure out how to reconcile columns in OpenRefine with data in our existing Solr cores (like CRP subjects)</li> <li>I spent a bit of time playing with <a href="https://github.com/codeforkjeff/conciliator">conciliator</a> and Solr, trying to figure out how to reconcile columns in OpenRefine with data in our existing Solr cores (like CRP subjects)</li>
<li>The documentation regarding the Solr stuff is limited, and I cannot figure out what all the fields in <code>conciliator.properties</code> are supposed to be</li> <li>The documentation regarding the Solr stuff is limited, and I cannot figure out what all the fields in <code>conciliator.properties</code> are supposed to be</li>
<li>But then I found <a href="https://github.com/okfn/reconcile-csv">reconcile-csv</a>, which allows you to reconcile against values in a CSV file!</li> <li>But then I found <a href="https://github.com/okfn/reconcile-csv">reconcile-csv</a>, which allows you to reconcile against values in a CSV file!</li>
<li>That, combined with splitting our multi-value fields on &ldquo;||&rdquo; in OpenRefine is amaaaaazing, because after reconciliation you can just join them again</li> <li>That, combined with splitting our multi-value fields on &ldquo;||&rdquo; in OpenRefine is amaaaaazing, because after reconciliation you can just join them again</li>
<li>Oh wow, you can also facet on the individual values once you&rsquo;ve split them! That&rsquo;s going to be amazing for proofing CRPs, subjects, etc.</li> <li>Oh wow, you can also facet on the individual values once you've split them! That's going to be amazing for proofing CRPs, subjects, etc.</li>
</ul> </ul>
<h2 id="20180509">2018-05-09</h2>
<h2 id="2018-05-09">2018-05-09</h2>
<ul> <ul>
<li>Udana asked about the Book Chapters we had been proofing on DSpace Test in 2018-04</li> <li>Udana asked about the Book Chapters we had been proofing on DSpace Test in 2018-04</li>
<li>I told him that there were still some TODO items for him on that data, for example to update the <code>dc.language.iso</code> field for the Spanish items</li> <li>I told him that there were still some TODO items for him on that data, for example to update the <code>dc.language.iso</code> field for the Spanish items</li>
<li>I was trying to remember how I parsed the <code>input-forms.xml</code> using <code>xmllint</code> to extract subjects neatly</li> <li>I was trying to remember how I parsed the <code>input-forms.xml</code> using <code>xmllint</code> to extract subjects neatly</li>
<li>I could use it with <a href="https://github.com/okfn/reconcile-csv">reconcile-csv</a> or to populate a Solr instance for reconciliation</li> <li>I could use it with <a href="https://github.com/okfn/reconcile-csv">reconcile-csv</a> or to populate a Solr instance for reconciliation</li>
<li>This XPath expression gets close, but outputs all items on one line:</li>
<li><p>This XPath expression gets close, but outputs all items on one line:</p> </ul>
<pre><code>$ xmllint --xpath '//value-pairs[@value-pairs-name=&quot;crpsubject&quot;]/pair/stored-value/node()' dspace/config/input-forms.xml <pre><code>$ xmllint --xpath '//value-pairs[@value-pairs-name=&quot;crpsubject&quot;]/pair/stored-value/node()' dspace/config/input-forms.xml
Agriculture for Nutrition and HealthBig DataClimate Change, Agriculture and Food SecurityExcellence in BreedingFishForests, Trees and AgroforestryGenebanksGrain Legumes and Dryland CerealsLivestockMaizePolicies, Institutions and MarketsRiceRoots, Tubers and BananasWater, Land and EcosystemsWheatAquatic Agricultural SystemsDryland CerealsDryland SystemsGrain LegumesIntegrated Systems for the Humid TropicsLivestock and Fish Agriculture for Nutrition and HealthBig DataClimate Change, Agriculture and Food SecurityExcellence in BreedingFishForests, Trees and AgroforestryGenebanksGrain Legumes and Dryland CerealsLivestockMaizePolicies, Institutions and MarketsRiceRoots, Tubers and BananasWater, Land and EcosystemsWheatAquatic Agricultural SystemsDryland CerealsDryland SystemsGrain LegumesIntegrated Systems for the Humid TropicsLivestock and Fish
</code></pre></li> </code></pre><ul>
<li>Maybe <code>xmlstarlet</code> is better:</li>
<li><p>Maybe <code>xmlstarlet</code> is better:</p> </ul>
<pre><code>$ xmlstarlet sel -t -v '//value-pairs[@value-pairs-name=&quot;crpsubject&quot;]/pair/stored-value/text()' dspace/config/input-forms.xml <pre><code>$ xmlstarlet sel -t -v '//value-pairs[@value-pairs-name=&quot;crpsubject&quot;]/pair/stored-value/text()' dspace/config/input-forms.xml
Agriculture for Nutrition and Health Agriculture for Nutrition and Health
Big Data Big Data
@ -285,209 +261,163 @@ Dryland Systems
Grain Legumes Grain Legumes
Integrated Systems for the Humid Tropics Integrated Systems for the Humid Tropics
Livestock and Fish Livestock and Fish
</code></pre></li> </code></pre><ul>
<li>Discuss Colombian BNARS harvesting the CIAT data from CGSpace</li>
<li><p>Discuss Colombian BNARS harvesting the CIAT data from CGSpace</p></li> <li>They are using a system called Primo and the only options for data harvesting in that system are via FTP and OAI</li>
<li>I told them to get all <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_35697">CIAT records via OAI</a></li>
<li><p>They are using a system called Primo and the only options for data harvesting in that system are via FTP and OAI</p></li> <li>Just a note to myself, I figured out how to get reconcile-csv to run from source rather than running the old pre-compiled JAR file:</li>
<li><p>I told them to get all <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_35697">CIAT records via OAI</a></p></li>
<li><p>Just a note to myself, I figured out how to get reconcile-csv to run from source rather than running the old pre-compiled JAR file:</p>
<pre><code>$ lein run /tmp/crps.csv name id
</code></pre></li>
<li><p>I tried to reconcile against a CSV of our countries but reconcile-csv crashes</p></li>
</ul> </ul>
<pre><code>$ lein run /tmp/crps.csv name id
<h2 id="2018-05-13">2018-05-13</h2> </code></pre><ul>
<li>I tried to reconcile against a CSV of our countries but reconcile-csv crashes</li>
</ul>
<h2 id="20180513">2018-05-13</h2>
<ul> <ul>
<li>It turns out there was a space in my &ldquo;country&rdquo; header that was causing reconcile-csv to crash</li> <li>It turns out there was a space in my &ldquo;country&rdquo; header that was causing reconcile-csv to crash</li>
<li>After removing that it works fine!</li> <li>After removing that it works fine!</li>
<li>Looking at Sisay&rsquo;s 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>&frasl;<sub>92904</sub></a>) <li>Looking at Sisay's 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904">10568/92904</a>)
<ul> <ul>
<li>Trimmed all leading / trailing white space and condensed multiple spaces into one</li> <li>Trimmed all leading / trailing white space and condensed multiple spaces into one</li>
<li>Corrected DOIs to use HTTPS and &ldquo;doi.org&rdquo; instead of &ldquo;dx.doi.org&rdquo;</li> <li>Corrected DOIs to use HTTPS and &ldquo;doi.org&rdquo; instead of &ldquo;dx.doi.org&rdquo;
<ul>
<li>There are eight items in <code>cg.identifier.doi</code> that are not DOIs)</li> <li>There are eight items in <code>cg.identifier.doi</code> that are not DOIs)</li>
</ul>
</li>
<li>Corrected <code>cg.identifier.url</code> links to cifor.org to use HTTPS</li> <li>Corrected <code>cg.identifier.url</code> links to cifor.org to use HTTPS</li>
<li>Corrected <code>dc.language.iso</code> from vt to vi (Vietnamese)</li> <li>Corrected <code>dc.language.iso</code> from vt to vi (Vietnamese)</li>
<li>Corrected affiliations to not use acronyms</li> <li>Corrected affiliations to not use acronyms</li>
<li>Reconcile countries against our countries list (removing terms like LATIN AMERICA, CENTRAL AFRICA, etc that are not countries)</li> <li>Reconcile countries against our countries list (removing terms like LATIN AMERICA, CENTRAL AFRICA, etc that are not countries)</li>
<li>Reconcile regions against our list of regions</li> <li>Reconcile regions against our list of regions</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-05-14">2018-05-14</h2> </ul>
<h2 id="20180514">2018-05-14</h2>
<ul> <ul>
<li>Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells</li> <li>Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells</li>
<li>Help Silvia Alonso get a list of all her publications since 2013 from Listings and Reports</li> <li>Help Silvia Alonso get a list of all her publications since 2013 from Listings and Reports</li>
</ul> </ul>
<h2 id="20180515">2018-05-15</h2>
<h2 id="2018-05-15">2018-05-15</h2>
<ul> <ul>
<li>Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!</li> <li>Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!</li>
<li>Also, I learned how to do something cool with Jython expressions in OpenRefine</li> <li>Also, I learned how to do something cool with Jython expressions in OpenRefine</li>
<li>This will fetch a URL and return its HTTP response code:</li>
<li><p>This will fetch a URL and return its HTTP response code:</p> </ul>
<pre><code>import urllib2 <pre><code>import urllib2
import re import re
pattern = re.compile('.*10.1016.*') pattern = re.compile('.*10.1016.*')
if pattern.match(value): if pattern.match(value):
get = urllib2.urlopen(value) get = urllib2.urlopen(value)
return get.getcode() return get.getcode()
return &quot;blank&quot; return &quot;blank&quot;
</code></pre></li> </code></pre><ul>
<li>I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs</li>
<li><p>I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs</p></li> <li>Here the response code would be 200, 404, etc, or &ldquo;blank&rdquo; if there is no URL for that item</li>
<li>You could use this in a facet or in a new column</li>
<li><p>Here the response code would be 200, 404, etc, or &ldquo;blank&rdquo; if there is no URL for that item</p></li> <li>More information and good examples here: <a href="https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine">https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine</a></li>
<li>Finish looking at the 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904">10568/92904</a>), cleaning up authors and adding collection mappings</li>
<li><p>You could use this in a facet or in a new column</p></li> <li>They can now be moved to CGSpace as far as I'm concerned, but I don't know if Sisay will do it or me</li>
<li>I was checking the CIFOR data for duplicates using Atmire's Metadata Quality Module (and found some duplicates actually), but then DSpace died&hellip;</li>
<li><p>More information and good examples here: <a href="https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine">https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine</a></p></li> <li>I didn't see anything in the Tomcat, DSpace, or Solr logs, but I saw this in <code>dmest -T</code>:</li>
</ul>
<li><p>Finish looking at the 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>&frasl;<sub>92904</sub></a>), cleaning up authors and adding collection mappings</p></li>
<li><p>They can now be moved to CGSpace as far as I&rsquo;m concerned, but I don&rsquo;t know if Sisay will do it or me</p></li>
<li><p>I was checking the CIFOR data for duplicates using Atmire&rsquo;s Metadata Quality Module (and found some duplicates actually), but then DSpace died&hellip;</p></li>
<li><p>I didn&rsquo;t see anything in the Tomcat, DSpace, or Solr logs, but I saw this in <code>dmest -T</code>:</p>
<pre><code>[Tue May 15 12:10:01 2018] Out of memory: Kill process 3763 (java) score 706 or sacrifice child <pre><code>[Tue May 15 12:10:01 2018] Out of memory: Kill process 3763 (java) score 706 or sacrifice child
[Tue May 15 12:10:01 2018] Killed process 3763 (java) total-vm:14667688kB, anon-rss:5705268kB, file-rss:0kB, shmem-rss:0kB [Tue May 15 12:10:01 2018] Killed process 3763 (java) total-vm:14667688kB, anon-rss:5705268kB, file-rss:0kB, shmem-rss:0kB
[Tue May 15 12:10:01 2018] oom_reaper: reaped process 3763 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue May 15 12:10:01 2018] oom_reaper: reaped process 3763 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>So the Linux kernel killed Java&hellip;</li>
<li><p>So the Linux kernel killed Java&hellip;</p></li> <li>Maria from Bioversity mailed to say she got an error while submitting an item on CGSpace:</li>
</ul>
<li><p>Maria from Bioversity mailed to say she got an error while submitting an item on CGSpace:</p>
<pre><code>Unable to load Submission Information, since WorkspaceID (ID:S96060) is not a valid in-process submission <pre><code>Unable to load Submission Information, since WorkspaceID (ID:S96060) is not a valid in-process submission
</code></pre></li> </code></pre><ul>
<li>Looking in the DSpace log I see something related:</li>
<li><p>Looking in the DSpace log I see something related:</p> </ul>
<pre><code>2018-05-15 12:35:30,858 INFO org.dspace.submit.step.CompleteStep @ m.garruccio@cgiar.org:session_id=8AC4499945F38B45EF7A1226E3042DAE:submission_complete:Completed submission with id=96060 <pre><code>2018-05-15 12:35:30,858 INFO org.dspace.submit.step.CompleteStep @ m.garruccio@cgiar.org:session_id=8AC4499945F38B45EF7A1226E3042DAE:submission_complete:Completed submission with id=96060
</code></pre></li> </code></pre><ul>
<li>So I'm not sure&hellip;</li>
<li><p>So I&rsquo;m not sure&hellip;</p></li> <li>I finally figured out how to get OpenRefine to reconcile values from Solr via <a href="https://github.com/codeforkjeff/conciliator">conciliator</a>:</li>
<li>The trick was to use a more appropriate Solr fieldType <code>text_en</code> instead of <code>text_general</code> so that more terms match, for example uppercase and lower case:</li>
<li><p>I finally figured out how to get OpenRefine to reconcile values from Solr via <a href="https://github.com/codeforkjeff/conciliator">conciliator</a>:</p></li> </ul>
<li><p>The trick was to use a more appropriate Solr fieldType <code>text_en</code> instead of <code>text_general</code> so that more terms match, for example uppercase and lower case:</p>
<pre><code>$ ./bin/solr start <pre><code>$ ./bin/solr start
$ ./bin/solr create_core -c countries $ ./bin/solr create_core -c countries
$ curl -X POST -H 'Content-type:application/json' --data-binary '{&quot;add-field&quot;: {&quot;name&quot;:&quot;country&quot;, &quot;type&quot;:&quot;text_en&quot;, &quot;multiValued&quot;:false, &quot;stored&quot;:true}}' http://localhost:8983/solr/countries/schema $ curl -X POST -H 'Content-type:application/json' --data-binary '{&quot;add-field&quot;: {&quot;name&quot;:&quot;country&quot;, &quot;type&quot;:&quot;text_en&quot;, &quot;multiValued&quot;:false, &quot;stored&quot;:true}}' http://localhost:8983/solr/countries/schema
$ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
</code></pre></li> </code></pre><ul>
<li>It still doesn't catch simple mistakes like &ldquo;ALBANI&rdquo; or &ldquo;AL BANIA&rdquo; for &ldquo;ALBANIA&rdquo;, and it doesn't return scores, so I have to select matches manually:</li>
<li><p>It still doesn&rsquo;t catch simple mistakes like &ldquo;ALBANI&rdquo; or &ldquo;AL BANIA&rdquo; for &ldquo;ALBANIA&rdquo;, and it doesn&rsquo;t return scores, so I have to select matches manually:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2018/05/openrefine-solr-conciliator.png" alt="OpenRefine reconciling countries from local Solr"></p>
<p><img src="/cgspace-notes/2018/05/openrefine-solr-conciliator.png" alt="OpenRefine reconciling countries from local Solr" /></p>
<ul> <ul>
<li><p>I should probably make a general copy field and set it to be the default search field, like DSpace&rsquo;s search core does (see schema.xml):</p> <li>I should probably make a general copy field and set it to be the default search field, like DSpace's search core does (see schema.xml):</li>
</ul>
<pre><code>&lt;defaultSearchField&gt;search_text&lt;/defaultSearchField&gt; <pre><code>&lt;defaultSearchField&gt;search_text&lt;/defaultSearchField&gt;
... ...
&lt;copyField source=&quot;*&quot; dest=&quot;search_text&quot;/&gt; &lt;copyField source=&quot;*&quot; dest=&quot;search_text&quot;/&gt;
</code></pre></li> </code></pre><ul>
<li>Actually, I wonder how much of their schema I could just copy&hellip;</li>
<li><p>Actually, I wonder how much of their schema I could just copy&hellip;</p></li> <li>Apparently the default search field is the <code>df</code> parameter and you could technically just add it to the query string, so no need to bother with that in the schema now</li>
<li>I copied over the DSpace <code>search_text</code> field type from the DSpace Solr config (had to remove some properties so Solr would start) but it doesn't seem to be any better at matching than the <code>text_en</code> type</li>
<li><p>Apparently the default search field is the <code>df</code> parameter and you could technically just add it to the query string, so no need to bother with that in the schema now</p></li> <li>I think I need to focus on trying to return scores with conciliator</li>
<li><p>I copied over the DSpace <code>search_text</code> field type from the DSpace Solr config (had to remove some properties so Solr would start) but it doesn&rsquo;t seem to be any better at matching than the <code>text_en</code> type</p></li>
<li><p>I think I need to focus on trying to return scores with conciliator</p></li>
</ul> </ul>
<h2 id="20180516">2018-05-16</h2>
<h2 id="2018-05-16">2018-05-16</h2>
<ul> <ul>
<li>Discuss GDPR with James Stapleton <li>Discuss GDPR with James Stapleton
<ul> <ul>
<li>As far as I see it, we are &ldquo;Data Controllers&rdquo; on CGSpace because we store peoples&rsquo; names, emails, and phone numbers if they register</li> <li>As far as I see it, we are &ldquo;Data Controllers&rdquo; on CGSpace because we store peoples&rsquo; names, emails, and phone numbers if they register</li>
<li>We set cookies on the user&rsquo;s computer, but these do not contain personally identifiable information (PII) and they are &ldquo;session&rdquo; cookies which are deleted when the user closes their browser</li> <li>We set cookies on the user's computer, but these do not contain personally identifiable information (PII) and they are &ldquo;session&rdquo; cookies which are deleted when the user closes their browser</li>
<li>We use Google Analytics to track website usage, which makes Google the &ldquo;Data Processor&rdquo; and in this case we merely need to <em>limit</em> or <em>obfuscate</em> the information we send to them</li> <li>We use Google Analytics to track website usage, which makes Google the &ldquo;Data Processor&rdquo; and in this case we merely need to <em>limit</em> or <em>obfuscate</em> the information we send to them</li>
<li>As the only personally identifiable information we send is the user&rsquo;s IP address, I think we only need to enable <a href="https://support.google.com/analytics/answer/2763052">IP Address Anonymization</a> in our <code>analytics.js</code> code snippets</li> <li>As the only personally identifiable information we send is the user's IP address, I think we only need to enable <a href="https://support.google.com/analytics/answer/2763052">IP Address Anonymization</a> in our <code>analytics.js</code> code snippets</li>
<li>Then we can add a &ldquo;Privacy&rdquo; page to CGSpace that makes all of this clear</li> <li>Then we can add a &ldquo;Privacy&rdquo; page to CGSpace that makes all of this clear</li>
</ul></li> </ul>
</li>
<li>Silvia asked if I could sort the records in her Listings and Report output and it turns out that the options are misconfigured in <code>dspace/config/modules/atmire-listings-and-reports.cfg</code></li> <li>Silvia asked if I could sort the records in her Listings and Report output and it turns out that the options are misconfigured in <code>dspace/config/modules/atmire-listings-and-reports.cfg</code></li>
<li>I created and merged a pull request to fix the sorting issue in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/374">#374</a>)</li> <li>I created and merged a pull request to fix the sorting issue in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/374">#374</a>)</li>
<li>Regarding the IP Address Anonymization for GDPR, I ammended the Google Analytics snippet in <code>page-structure-alterations.xsl</code> to:</li>
<li><p>Regarding the IP Address Anonymization for GDPR, I ammended the Google Analytics snippet in <code>page-structure-alterations.xsl</code> to:</p>
<pre><code>ga('send', 'pageview', {
'anonymizeIp': true
});
</code></pre></li>
<li><p>I tested loading a certain page before and after adding this and afterwards I saw that the parameter <code>aip=1</code> was being sent with the analytics response to Google</p></li>
<li><p>According to the <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#anonymizeIp">analytics.js protocol parameter documentation</a> this means that IPs are being anonymized</p></li>
<li><p>After finding and fixing some duplicates in IITA&rsquo;s <code>IITA_April_27</code> test collection on DSpace Test (<sup>10568</sup>&frasl;<sub>92703</sub>) I told Sisay that he can move them to IITA&rsquo;s Journal Articles collection on CGSpace</p></li>
</ul> </ul>
<pre><code>ga('send', 'pageview', {
<h2 id="2018-05-17">2018-05-17</h2> 'anonymizeIp': true
});
</code></pre><ul>
<li>I tested loading a certain page before and after adding this and afterwards I saw that the parameter <code>aip=1</code> was being sent with the analytics response to Google</li>
<li>According to the <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#anonymizeIp">analytics.js protocol parameter documentation</a> this means that IPs are being anonymized</li>
<li>After finding and fixing some duplicates in IITA's <code>IITA_April_27</code> test collection on DSpace Test (10568/92703) I told Sisay that he can move them to IITA's Journal Articles collection on CGSpace</li>
</ul>
<h2 id="20180517">2018-05-17</h2>
<ul> <ul>
<li>Testing reconciliation of countries against Solr via conciliator, I notice that <code>CÔTE D'IVOIRE</code> doesn&rsquo;t match <code>COTE D'IVOIRE</code>, whereas with reconcile-csv it does</li> <li>Testing reconciliation of countries against Solr via conciliator, I notice that <code>CÔTE D'IVOIRE</code> doesn't match <code>COTE D'IVOIRE</code>, whereas with reconcile-csv it does</li>
<li>Also, when reconciling regions against Solr via conciliator <code>EASTERN AFRICA</code> doesn&rsquo;t match <code>EAST AFRICA</code>, whereas with reconcile-csv it does</li> <li>Also, when reconciling regions against Solr via conciliator <code>EASTERN AFRICA</code> doesn't match <code>EAST AFRICA</code>, whereas with reconcile-csv it does</li>
<li>And <code>SOUTH AMERICA</code> matches both <code>SOUTH ASIA</code> and <code>SOUTH AMERICA</code> with the same match score of 2&hellip; WTF.</li> <li>And <code>SOUTH AMERICA</code> matches both <code>SOUTH ASIA</code> and <code>SOUTH AMERICA</code> with the same match score of 2&hellip; WTF.</li>
<li>It could be that I just need to tune the query filter in Solr (currently using the example <code>text_en</code> field type)</li> <li>It could be that I just need to tune the query filter in Solr (currently using the example <code>text_en</code> field type)</li>
<li>Oh sweet, it turns out that the issue with searching for characters with accents is called &ldquo;code folding&rdquo; in Solr</li> <li>Oh sweet, it turns out that the issue with searching for characters with accents is called &ldquo;code folding&rdquo; in Solr</li>
<li>You can use either a <a href="https://lucene.apache.org/solr/guide/7_3/language-analysis.html"><code>solr.ASCIIFoldingFilterFactory</code> filter</a> or a <a href="https://lucene.apache.org/solr/guide/7_3/charfilterfactories.html"><code>solr.MappingCharFilterFactory</code> charFilter</a> mapping against <code>mapping-FoldToASCII.txt</code></li> <li>You can use either a <a href="https://lucene.apache.org/solr/guide/7_3/language-analysis.html"><code>solr.ASCIIFoldingFilterFactory</code> filter</a> or a <a href="https://lucene.apache.org/solr/guide/7_3/charfilterfactories.html"><code>solr.MappingCharFilterFactory</code> charFilter</a> mapping against <code>mapping-FoldToASCII.txt</code></li>
<li>Also see: <a href="https://opensourceconnections.com/blog/2017/02/20/solr-utf8/">https://opensourceconnections.com/blog/2017/02/20/solr-utf8/</a></li> <li>Also see: <a href="https://opensourceconnections.com/blog/2017/02/20/solr-utf8/">https://opensourceconnections.com/blog/2017/02/20/solr-utf8/</a></li>
<li>Now <code>CÔTE D'IVOIRE</code> matches <code>COTE D'IVOIRE</code>!</li> <li>Now <code>CÔTE D'IVOIRE</code> matches <code>COTE D'IVOIRE</code>!</li>
<li>I&rsquo;m not sure which method is better, perhaps the <code>solr.ASCIIFoldingFilterFactory</code> filter because it doesn&rsquo;t require copying the <code>mapping-FoldToASCII.txt</code> file</li> <li>I'm not sure which method is better, perhaps the <code>solr.ASCIIFoldingFilterFactory</code> filter because it doesn't require copying the <code>mapping-FoldToASCII.txt</code> file</li>
<li>And actually I&rsquo;m not entirely sure about the order of filtering before tokenizing, etc&hellip;</li> <li>And actually I'm not entirely sure about the order of filtering before tokenizing, etc&hellip;</li>
<li>Ah, I see that <code>charFilter</code> must be before the tokenizer because it works on a stream, whereas <code>filter</code> operates on tokenized input so it must come after the tokenizer</li> <li>Ah, I see that <code>charFilter</code> must be before the tokenizer because it works on a stream, whereas <code>filter</code> operates on tokenized input so it must come after the tokenizer</li>
<li>Regarding the use of the <code>charFilter</code> vs the <code>filter</code> class before and after the tokenizer, respectively, I think it&rsquo;s better to use the <code>charFilter</code> to normalize the input stream before tokenizing it as I have no idea what kinda stuff might get removed by the tokenizer</li> <li>Regarding the use of the <code>charFilter</code> vs the <code>filter</code> class before and after the tokenizer, respectively, I think it's better to use the <code>charFilter</code> to normalize the input stream before tokenizing it as I have no idea what kinda stuff might get removed by the tokenizer</li>
<li>Skype with Geoffrey from IITA in Nairobi who wants to deposit records to CGSpace via the REST API but I told him that this skips the submission workflows and because we cannot guarantee the data quality we would not allow anyone to use it this way</li> <li>Skype with Geoffrey from IITA in Nairobi who wants to deposit records to CGSpace via the REST API but I told him that this skips the submission workflows and because we cannot guarantee the data quality we would not allow anyone to use it this way</li>
<li>I finished making the XMLUI changes for anonymization of IP addresses in Google Analytics and merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/375">#375</a></li> <li>I finished making the XMLUI changes for anonymization of IP addresses in Google Analytics and merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/375">#375</a></li>
<li>Also, I think we might be able to implement <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/user-opt-out">opt-out functionality for Google Analytics using a window property</a> that could be managed by <a href="https://webgilde.com/en/analytics-opt-out/">storing its status in a cookie</a></li> <li>Also, I think we might be able to implement <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/user-opt-out">opt-out functionality for Google Analytics using a window property</a> that could be managed by <a href="https://webgilde.com/en/analytics-opt-out/">storing its status in a cookie</a></li>
<li>This cookie could be set by a user clicking a link in a privacy policy, for example</li> <li>This cookie could be set by a user clicking a link in a privacy policy, for example</li>
<li>The additional Javascript could be easily added to our existing <code>googleAnalytics</code> template in each XMLUI theme</li> <li>The additional Javascript could be easily added to our existing <code>googleAnalytics</code> template in each XMLUI theme</li>
</ul> </ul>
<h2 id="20180518">2018-05-18</h2>
<h2 id="2018-05-18">2018-05-18</h2>
<ul> <ul>
<li>Do a final check on the thirty (30) IWMI Book Chapters for Udana and upload them to CGSpace</li> <li>Do a final check on the thirty (30) IWMI Book Chapters for Udana and upload them to CGSpace</li>
<li>These were previously on <a href="https://dspacetest.cgiar.org/handle/10568/91679">DSpace Test as &ldquo;IWMI test collection&rdquo;</a> in 2018-04</li> <li>These were previously on <a href="https://dspacetest.cgiar.org/handle/10568/91679">DSpace Test as &ldquo;IWMI test collection&rdquo;</a> in 2018-04</li>
</ul> </ul>
<h2 id="20180520">2018-05-20</h2>
<h2 id="2018-05-20">2018-05-20</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test (linode19), re-deploy DSpace with latest <code>5_x-dev</code> branch (including GDPR IP anonymization), and reboot the server</li> <li>Run all system updates on DSpace Test (linode19), re-deploy DSpace with latest <code>5_x-dev</code> branch (including GDPR IP anonymization), and reboot the server</li>
<li>Run all system updates on CGSpace (linode18), re-deploy DSpace with latest <code>5_x-dev</code> branch (including GDPR IP anonymization), and reboot the server</li> <li>Run all system updates on CGSpace (linode18), re-deploy DSpace with latest <code>5_x-dev</code> branch (including GDPR IP anonymization), and reboot the server</li>
</ul> </ul>
<h2 id="20180521">2018-05-21</h2>
<h2 id="2018-05-21">2018-05-21</h2>
<ul> <ul>
<li>Geoffrey from IITA got back with more questions about depositing items programatically into the CGSpace workflow</li> <li>Geoffrey from IITA got back with more questions about depositing items programatically into the CGSpace workflow</li>
<li>I pointed out that <a href="http://swordapp.org/">SWORD</a> might be an option, as <a href="https://wiki.duraspace.org/display/DSDOC5x/SWORDv2+Server">DSpace supports the SWORDv2 protocol</a> (although we have never tested it)</li> <li>I pointed out that <a href="http://swordapp.org/">SWORD</a> might be an option, as <a href="https://wiki.duraspace.org/display/DSDOC5x/SWORDv2+Server">DSpace supports the SWORDv2 protocol</a> (although we have never tested it)</li>
<li>Work on implementing <a href="https://cookieconsent.insites.com">cookie consent</a> popup for all XMLUI themes (SASS theme with primary / secondary branding from Bootstrap)</li> <li>Work on implementing <a href="https://cookieconsent.insites.com">cookie consent</a> popup for all XMLUI themes (SASS theme with primary / secondary branding from Bootstrap)</li>
</ul> </ul>
<h2 id="20180522">2018-05-22</h2>
<h2 id="2018-05-22">2018-05-22</h2>
<ul> <ul>
<li>Skype with James Stapleton about last minute GDPR wording</li> <li>Skype with James Stapleton about last minute GDPR wording</li>
<li>After spending yesterday working on integration and theming of the cookieconsent popup, today I cannot get the damn &ldquo;Agree&rdquo; button to dismiss the popup!</li> <li>After spending yesterday working on integration and theming of the cookieconsent popup, today I cannot get the damn &ldquo;Agree&rdquo; button to dismiss the popup!</li>
@ -497,95 +427,64 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>This is a waste of TWO full days of work</li> <li>This is a waste of TWO full days of work</li>
<li>Marissa Van Epp asked if I could add <code>PII-FP1_PACCA2</code> to the CCAFS phase II project tags on CGSpace so I created a ticket to track it (<a href="https://github.com/ilri/DSpace/issues/376">#376</a>)</li> <li>Marissa Van Epp asked if I could add <code>PII-FP1_PACCA2</code> to the CCAFS phase II project tags on CGSpace so I created a ticket to track it (<a href="https://github.com/ilri/DSpace/issues/376">#376</a>)</li>
</ul> </ul>
<h2 id="20180523">2018-05-23</h2>
<h2 id="2018-05-23">2018-05-23</h2>
<ul> <ul>
<li><p>I&rsquo;m investigating how many non-CGIAR users we have registered on CGSpace:</p> <li>I'm investigating how many non-CGIAR users we have registered on CGSpace:</li>
<pre><code>dspace=# select email, netid from eperson where email not like '%cgiar.org%' and email like '%@%';
</code></pre></li>
<li><p>We might need to do something regarding these users for GDPR compliance because we have their names, emails, and potentially phone numbers</p></li>
<li><p>I decided that I will just use the cookieconsent script as is, since it looks good and technically does set the cookie with &ldquo;allow&rdquo; or &ldquo;dismiss&rdquo;</p></li>
<li><p>I wrote a quick conditional to check if the user has agreed or not before enabling Google Analytics</p></li>
<li><p>I made a pull request for the GDPR compliance popup (<a href="https://github.com/ilri/DSpace/pull/377">#377</a>) and merged it to the <code>5_x-prod</code> branch</p></li>
<li><p>I will deploy it to CGSpace tonight</p></li>
</ul> </ul>
<pre><code>dspace=# select email, netid from eperson where email not like '%cgiar.org%' and email like '%@%';
<h2 id="2018-05-28">2018-05-28</h2> </code></pre><ul>
<li>We might need to do something regarding these users for GDPR compliance because we have their names, emails, and potentially phone numbers</li>
<li>I decided that I will just use the cookieconsent script as is, since it looks good and technically does set the cookie with &ldquo;allow&rdquo; or &ldquo;dismiss&rdquo;</li>
<li>I wrote a quick conditional to check if the user has agreed or not before enabling Google Analytics</li>
<li>I made a pull request for the GDPR compliance popup (<a href="https://github.com/ilri/DSpace/pull/377">#377</a>) and merged it to the <code>5_x-prod</code> branch</li>
<li>I will deploy it to CGSpace tonight</li>
</ul>
<h2 id="20180528">2018-05-28</h2>
<ul> <ul>
<li>Daniel Haile-Michael sent a message that CGSpace was down (I am currently in Oregon so the time difference is ~10 hours)</li> <li>Daniel Haile-Michael sent a message that CGSpace was down (I am currently in Oregon so the time difference is ~10 hours)</li>
<li>I looked in the logs but didn&rsquo;t see anything that would be the cause of the crash</li> <li>I looked in the logs but didn't see anything that would be the cause of the crash</li>
<li>Atmire finalized the DSpace 5.8 testing and sent a pull request: <a href="https://github.com/ilri/DSpace/pull/378">https://github.com/ilri/DSpace/pull/378</a></li> <li>Atmire finalized the DSpace 5.8 testing and sent a pull request: <a href="https://github.com/ilri/DSpace/pull/378">https://github.com/ilri/DSpace/pull/378</a></li>
<li>They have asked if I can test this and get back to them by June 11th</li> <li>They have asked if I can test this and get back to them by June 11th</li>
</ul> </ul>
<h2 id="20180530">2018-05-30</h2>
<h2 id="2018-05-30">2018-05-30</h2>
<ul> <ul>
<li>Talk to Samantha from Bioversity about something related to Google Analytics, I&rsquo;m still not sure what they want</li> <li>Talk to Samantha from Bioversity about something related to Google Analytics, I'm still not sure what they want</li>
<li>DSpace Test crashed last night, seems to be related to system memory (not JVM heap)</li> <li>DSpace Test crashed last night, seems to be related to system memory (not JVM heap)</li>
<li>I see this in <code>dmesg</code>:</li>
<li><p>I see this in <code>dmesg</code>:</p> </ul>
<pre><code>[Wed May 30 00:00:39 2018] Out of memory: Kill process 6082 (java) score 697 or sacrifice child <pre><code>[Wed May 30 00:00:39 2018] Out of memory: Kill process 6082 (java) score 697 or sacrifice child
[Wed May 30 00:00:39 2018] Killed process 6082 (java) total-vm:14876264kB, anon-rss:5683372kB, file-rss:0kB, shmem-rss:0kB [Wed May 30 00:00:39 2018] Killed process 6082 (java) total-vm:14876264kB, anon-rss:5683372kB, file-rss:0kB, shmem-rss:0kB
[Wed May 30 00:00:40 2018] oom_reaper: reaped process 6082 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Wed May 30 00:00:40 2018] oom_reaper: reaped process 6082 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>I need to check the Tomcat JVM heap size/usage, command line JVM heap size (for cron jobs), and PostgreSQL memory usage</li>
<li><p>I need to check the Tomcat JVM heap size/usage, command line JVM heap size (for cron jobs), and PostgreSQL memory usage</p></li> <li>It might be possible to adjust some things, but eventually we'll need a larger VPS instance</li>
<li>For some reason there are no JVM stats in Munin, ugh</li>
<li><p>It might be possible to adjust some things, but eventually we&rsquo;ll need a larger VPS instance</p></li> <li>Run all system updates on DSpace Test and reboot it</li>
<li>I generated a list of CIFOR duplicates from the <code>CIFOR_May_9</code> collection using the Atmire MQM module and then dumped the HTML source so I could process it for sending to Vika</li>
<li><p>For some reason there are no JVM stats in Munin, ugh</p></li> <li>I used grep to filter all relevant handle lines from the HTML source then used sed to insert a newline before each &ldquo;Item1&rdquo; line (as the duplicates are grouped like Item1, Item2, Item3 for each set of duplicates):</li>
</ul>
<li><p>Run all system updates on DSpace Test and reboot it</p></li>
<li><p>I generated a list of CIFOR duplicates from the <code>CIFOR_May_9</code> collection using the Atmire MQM module and then dumped the HTML source so I could process it for sending to Vika</p></li>
<li><p>I used grep to filter all relevant handle lines from the HTML source then used sed to insert a newline before each &ldquo;Item1&rdquo; line (as the duplicates are grouped like Item1, Item2, Item3 for each set of duplicates):</p>
<pre><code>$ grep -E 'aspect.duplicatechecker.DuplicateResults.field.del_handle_[0-9]{1,3}_Item' ~/Desktop/https\ _dspacetest.cgiar.org_atmire_metadata-quality_duplicate-checker.html &gt; ~/cifor-duplicates.txt <pre><code>$ grep -E 'aspect.duplicatechecker.DuplicateResults.field.del_handle_[0-9]{1,3}_Item' ~/Desktop/https\ _dspacetest.cgiar.org_atmire_metadata-quality_duplicate-checker.html &gt; ~/cifor-duplicates.txt
$ sed 's/.*Item1.*/\n&amp;/g' ~/cifor-duplicates.txt &gt; ~/cifor-duplicates-cleaned.txt $ sed 's/.*Item1.*/\n&amp;/g' ~/cifor-duplicates.txt &gt; ~/cifor-duplicates-cleaned.txt
</code></pre></li> </code></pre><ul>
<li>I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR's collection</li>
<li><p>I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR&rsquo;s collection</p></li> <li>A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections</li>
<li>I can use the <code>/communities/{id}/collections</code> endpoint of the REST API but it only takes IDs (not handles) and doesn't seem to descend into sub communities</li>
<li><p>A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections</p></li> <li>Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)</li>
<li>There has got to be a better way to do this than going to each community and getting their handles and IDs manually</li>
<li><p>I can use the <code>/communities/{id}/collections</code> endpoint of the REST API but it only takes IDs (not handles) and doesn&rsquo;t seem to descend into sub communities</p></li> <li>Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></li>
<li>The output isn't great, but all the handles and IDs are printed in debug mode:</li>
<li><p>Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)</p></li>
<li><p>There has got to be a better way to do this than going to each community and getting their handles and IDs manually</p></li>
<li><p>Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></p></li>
<li><p>The output isn&rsquo;t great, but all the handles and IDs are printed in debug mode:</p>
<pre><code>$ ./rest-find-collections.py -u https://cgspace.cgiar.org/rest -d 10568/1 2&gt; /tmp/ilri-collections.txt
</code></pre></li>
<li><p>Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):</p>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
</code></pre></li>
</ul> </ul>
<pre><code>$ ./rest-find-collections.py -u https://cgspace.cgiar.org/rest -d 10568/1 2&gt; /tmp/ilri-collections.txt
<h2 id="2018-05-31">2018-05-31</h2> </code></pre><ul>
<li>Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
</code></pre><h2 id="20180531">2018-05-31</h2>
<ul> <ul>
<li>Clarify CGSpace&rsquo;s usage of Google Analytics and personally identifiable information during user registration for Bioversity team who had been asking about GDPR compliance</li> <li>Clarify CGSpace's usage of Google Analytics and personally identifiable information during user registration for Bioversity team who had been asking about GDPR compliance</li>
<li>Testing running PostgreSQL in a Docker container on localhost because when I&rsquo;m on Arch Linux there isn&rsquo;t an easily installable package for particular PostgreSQL versions</li> <li>Testing running PostgreSQL in a Docker container on localhost because when I'm on Arch Linux there isn't an easily installable package for particular PostgreSQL versions</li>
<li>Now I can just use Docker:</li>
<li><p>Now I can just use Docker:</p> </ul>
<pre><code>$ docker pull postgres:9.5-alpine <pre><code>$ docker pull postgres:9.5-alpine
$ docker run --name dspacedb -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.5-alpine $ docker run --name dspacedb -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.5-alpine
$ createuser -h localhost -U postgres --pwprompt dspacetest $ createuser -h localhost -U postgres --pwprompt dspacetest
@ -595,8 +494,7 @@ $ pg_restore -h localhost -O -U dspacetest -d dspacetest -W -h localhost ~/Downl
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
$ psql -h localhost -U postgres dspacetest $ psql -h localhost -U postgres dspacetest
</code></pre></li> </code></pre>
</ul>

View File

@ -8,21 +8,17 @@
<meta property="og:title" content="June, 2018" /> <meta property="og:title" content="June, 2018" />
<meta property="og:description" content="2018-06-04 <meta property="og:description" content="2018-06-04
Test the DSpace 5.8 module upgrades from Atmire (#378) Test the DSpace 5.8 module upgrades from Atmire (#378)
There seems to be a problem with the CUA and L&amp;R versions in pom.xml because they are using SNAPSHOT and it doesn&#39;t build
There seems to be a problem with the CUA and L&amp;R versions in pom.xml because they are using SNAPSHOT and it doesn&rsquo;t build
I added the new CCAFS Phase II Project Tag PII-FP1_PACCA2 and merged it into the 5_x-prod branch (#379) I added the new CCAFS Phase II Project Tag PII-FP1_PACCA2 and merged it into the 5_x-prod branch (#379)
I proofed and tested the ILRI author corrections that Peter sent back to me this week: I proofed and tested the ILRI author corrections that Peter sent back to me this week:
$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.contributor.author -t correct -m 3 -n $ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.contributor.author -t correct -m 3 -n
I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in March, 2018 I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in March, 2018
Time to index ~70,000 items on CGSpace: Time to index ~70,000 items on CGSpace:
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
@ -30,7 +26,6 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discov
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-06/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-06/" />
@ -41,21 +36,17 @@ sys 2m7.289s
<meta name="twitter:title" content="June, 2018"/> <meta name="twitter:title" content="June, 2018"/>
<meta name="twitter:description" content="2018-06-04 <meta name="twitter:description" content="2018-06-04
Test the DSpace 5.8 module upgrades from Atmire (#378) Test the DSpace 5.8 module upgrades from Atmire (#378)
There seems to be a problem with the CUA and L&amp;R versions in pom.xml because they are using SNAPSHOT and it doesn&#39;t build
There seems to be a problem with the CUA and L&amp;R versions in pom.xml because they are using SNAPSHOT and it doesn&rsquo;t build
I added the new CCAFS Phase II Project Tag PII-FP1_PACCA2 and merged it into the 5_x-prod branch (#379) I added the new CCAFS Phase II Project Tag PII-FP1_PACCA2 and merged it into the 5_x-prod branch (#379)
I proofed and tested the ILRI author corrections that Peter sent back to me this week: I proofed and tested the ILRI author corrections that Peter sent back to me this week:
$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.contributor.author -t correct -m 3 -n $ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.contributor.author -t correct -m 3 -n
I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in March, 2018 I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in March, 2018
Time to index ~70,000 items on CGSpace: Time to index ~70,000 items on CGSpace:
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
@ -63,9 +54,8 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discov
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -146,49 +136,39 @@ sys 2m7.289s
</p> </p>
</header> </header>
<h2 id="2018-06-04">2018-06-04</h2> <h2 id="20180604">2018-06-04</h2>
<ul> <ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>) <li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul> <ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li> <li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn't build</li>
</ul></li> </ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li> <li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre></li> </code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li> <li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<li><p>Time to index ~70,000 items on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
</code></pre></li> </code></pre><h2 id="20180606">2018-06-06</h2>
</ul>
<h2 id="2018-06-06">2018-06-06</h2>
<ul> <ul>
<li>It turns out that I needed to add a server block for <code>atmire.com-snapshots</code> to my Maven settings, so now the Atmire code builds</li> <li>It turns out that I needed to add a server block for <code>atmire.com-snapshots</code> to my Maven settings, so now the Atmire code builds</li>
<li>Now Maven and Ant run properly, but I&rsquo;m getting SQL migration errors in <code>dspace.log</code> after starting Tomcat</li> <li>Now Maven and Ant run properly, but I'm getting SQL migration errors in <code>dspace.log</code> after starting Tomcat</li>
<li>I&rsquo;ve updated my ticket on Atmire&rsquo;s bug tracker: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li> <li>I've updated my ticket on Atmire's bug tracker: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
</ul> </ul>
<h2 id="20180607">2018-06-07</h2>
<h2 id="2018-06-07">2018-06-07</h2>
<ul> <ul>
<li>Proofing 200 IITA records on DSpace Test for Sisay: <a href="https://dspacetest.cgiar.org/handle/10568/95391">IITA_Junel_06 (<sup>10568</sup>&frasl;<sub>95391</sub>)</a> <li>Proofing 200 IITA records on DSpace Test for Sisay: <a href="https://dspacetest.cgiar.org/handle/10568/95391">IITA_Junel_06 (10568/95391)</a>
<ul> <ul>
<li>Mispelled authorship type: CGAIR single center should be: CGIAR single centre</li> <li>Mispelled authorship type: CGAIR single center should be: CGIAR single centre</li>
<li>I see some encoding errors in author affiliations, for example:</li> <li>I see some encoding errors in author affiliations, for example:
<ul>
<li>Universidade de SÆo Paulo</li> <li>Universidade de SÆo Paulo</li>
<li>Institut National des Recherches Agricoles du B nin</li> <li>Institut National des Recherches Agricoles du B nin</li>
<li>Centre de Coop ration Internationale en Recherche Agronomique pour le D veloppement</li> <li>Centre de Coop ration Internationale en Recherche Agronomique pour le D veloppement</li>
@ -198,125 +178,117 @@ sys 2m7.289s
<li>Projet de Gestion des Ressources Naturelles, B nin</li> <li>Projet de Gestion des Ressources Naturelles, B nin</li>
<li>Universit t Hannover</li> <li>Universit t Hannover</li>
<li>Universit F lix Houphouet-Boigny</li> <li>Universit F lix Houphouet-Boigny</li>
</ul></li> </ul>
</li>
</ul>
</li>
<li>I uploaded fixes for all those now, but I will continue with the rest of the data later</li> <li>I uploaded fixes for all those now, but I will continue with the rest of the data later</li>
<li>Regarding the SQL migration errors, Atmire told me I need to run some migrations manually in PostgreSQL:</li>
<li><p>Regarding the SQL migration errors, Atmire told me I need to run some migrations manually in PostgreSQL:</p> </ul>
<pre><code>delete from schema_version where version = '5.6.2015.12.03.2'; <pre><code>delete from schema_version where version = '5.6.2015.12.03.2';
update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2'; update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2';
update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015.12.03.3'; update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015.12.03.3';
</code></pre></li> </code></pre><ul>
<li>And then I need to ignore the ignored ones:</li>
<li><p>And then I need to ignore the ignored ones:</p>
<pre><code>$ ~/dspace/bin/dspace database migrate ignored
</code></pre></li>
<li><p>Now DSpace starts up properly!</p></li>
<li><p>Gabriela from CIP got back to me about the author names we were correcting on CGSpace</p></li>
<li><p>I did a quick sanity check on them and then did a test import with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-06-08-CIP-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
</code></pre></li>
<li><p>I will apply them on CGSpace tomorrow I think&hellip;</p></li>
</ul> </ul>
<pre><code>$ ~/dspace/bin/dspace database migrate ignored
<h2 id="2018-06-09">2018-06-09</h2> </code></pre><ul>
<li>Now DSpace starts up properly!</li>
<li>Gabriela from CIP got back to me about the author names we were correcting on CGSpace</li>
<li>I did a quick sanity check on them and then did a test import with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-06-08-CIP-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
</code></pre><ul>
<li>I will apply them on CGSpace tomorrow I think&hellip;</li>
</ul>
<h2 id="20180609">2018-06-09</h2>
<ul> <ul>
<li>It&rsquo;s pretty annoying, but the JVM monitoring for Munin was never set up when I migrated DSpace Test to its new server a few months ago</li> <li>It's pretty annoying, but the JVM monitoring for Munin was never set up when I migrated DSpace Test to its new server a few months ago</li>
<li>I ran the tomcat and munin-node tags in Ansible again and now the stuff is all wired up and recording stats properly</li> <li>I ran the tomcat and munin-node tags in Ansible again and now the stuff is all wired up and recording stats properly</li>
<li>I applied the CIP author corrections on CGSpace and DSpace Test and re-ran the Discovery indexing</li> <li>I applied the CIP author corrections on CGSpace and DSpace Test and re-ran the Discovery indexing</li>
</ul> </ul>
<h2 id="20180610">2018-06-10</h2>
<h2 id="2018-06-10">2018-06-10</h2>
<ul> <ul>
<li>I spent some time removing the Atmire Metadata Quality Module (MQM) from the proposed DSpace 5.8 changes</li> <li>I spent some time removing the Atmire Metadata Quality Module (MQM) from the proposed DSpace 5.8 changes</li>
<li>After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:</li>
<li><p>After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:</p> </ul>
<pre><code> INFO [org.dspace.servicemanager.DSpaceServiceManager] Shutdown DSpace core service manager
<pre><code>INFO [org.dspace.servicemanager.DSpaceServiceManager] Shutdown DSpace core service manager
Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'org.dspace.servicemanager.spring.DSpaceBeanPostProcessor#0' defined in class path resource [spring/spring-dspace-applicationContext.xml]: Unsatisfied dependency expressed through constructor argument with index 0 of type [org.dspace.servicemanager.config.DSpaceConfigurationService]: : Cannot find class [com.atmire.dspace.discovery.ItemCollectionPlugin] for bean with name 'itemCollectionPlugin' defined in file [/home/aorth/dspace/config/spring/api/discovery.xml]; Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'org.dspace.servicemanager.spring.DSpaceBeanPostProcessor#0' defined in class path resource [spring/spring-dspace-applicationContext.xml]: Unsatisfied dependency expressed through constructor argument with index 0 of type [org.dspace.servicemanager.config.DSpaceConfigurationService]: : Cannot find class [com.atmire.dspace.discovery.ItemCollectionPlugin] for bean with name 'itemCollectionPlugin' defined in file [/home/aorth/dspace/config/spring/api/discovery.xml];
</code></pre></li> </code></pre><ul>
<li>I can fix this by commenting out the <code>ItemCollectionPlugin</code> line of <code>discovery.xml</code>, but from looking at the git log I'm not actually sure if that is related to MQM or not</li>
<li><p>I can fix this by commenting out the <code>ItemCollectionPlugin</code> line of <code>discovery.xml</code>, but from looking at the git log I&rsquo;m not actually sure if that is related to MQM or not</p></li> <li>I will have to ask Atmire</li>
<li>I continued to look at Sisay's IITA records from last week
<li><p>I will have to ask Atmire</p></li>
<li><p>I continued to look at Sisay&rsquo;s IITA records from last week</p>
<ul> <ul>
<li>I normalized all DOIs to use HTTPS and &ldquo;doi.org&rdquo; instead of &ldquo;dx.doi.org&rdquo;</li> <li>I normalized all DOIs to use HTTPS and &ldquo;doi.org&rdquo; instead of &ldquo;dx.doi.org&rdquo;</li>
<li>I cleaned up white space in <code>cg.subject.iita</code> and <code>dc.subject</code></li> <li>I cleaned up white space in <code>cg.subject.iita</code> and <code>dc.subject</code></li>
<li>Even a bunch of IITA and AGROVOC subjects are missing accents, ie &ldquo;FERTILIT DU SOL&rdquo;</li> <li>Even a bunch of IITA and AGROVOC subjects are missing accents, ie &ldquo;FERTILIT DU SOL&rdquo;</li>
<li>More organization names in <code>dc.description.sponsorship</code> are incorrect (ie, missing accents) or inconsistent (ie, CGIAR centers should be spelled in English or multiple spellings of the same one, like &ldquo;Rockefeller Foundation&rdquo; and &ldquo;Rockefeller foundation&rdquo;)</li> <li>More organization names in <code>dc.description.sponsorship</code> are incorrect (ie, missing accents) or inconsistent (ie, CGIAR centers should be spelled in English or multiple spellings of the same one, like &ldquo;Rockefeller Foundation&rdquo; and &ldquo;Rockefeller foundation&rdquo;)</li>
<li>A few dozen items have abstracts with character encoding errors, ie:</li> <li>A few dozen items have abstracts with character encoding errors, ie:
<ul>
<li>33.7øC</li> <li>33.7øC</li>
<li>MgSO4ú7H2O</li> <li>MgSO4ú7H2O</li>
<li>ha??1&amp;/sup;</li> <li>ha??1&amp;/sup;</li>
<li>En gen6ral</li> <li>En gen6ral</li>
<li>dÕpassÕ</li> <li>dÕpassÕ</li>
<li>Also the abstracts have missing accents, ie &ldquo;recherche sur le d veloppement&rdquo;</li>
</ul></li>
<li><p>I will have to tell IITA people to redo these entirely I think&hellip;</p></li>
</ul> </ul>
</li>
<h2 id="2018-06-11">2018-06-11</h2> <li>Also the abstracts have missing accents, ie &ldquo;recherche sur le d veloppement&rdquo;</li>
</ul>
</li>
<li>I will have to tell IITA people to redo these entirely I think&hellip;</li>
</ul>
<h2 id="20180611">2018-06-11</h2>
<ul> <ul>
<li>Sisay sent a new version of the last IITA records that he created from the original CSV from IITA</li> <li>Sisay sent a new version of the last IITA records that he created from the original CSV from IITA</li>
<li>The 200 records are in the <a href="https://dspacetest.cgiar.org/handle/10568/95870">IITA_Junel_11 (<sup>10568</sup>&frasl;<sub>95870</sub>)</a> collection</li> <li>The 200 records are in the <a href="https://dspacetest.cgiar.org/handle/10568/95870">IITA_Junel_11 (10568/95870)</a> collection</li>
<li>Many errors: <li>Many errors:
<ul> <ul>
<li>Authorship types: &ldquo;CGIAR ans advanced research institute&rdquo;, &ldquo;CGAIR and advanced research institute&rdquo;, &ldquo;CGIAR and advanced research institutes&rdquo;, &ldquo;CGAIR single center&rdquo;</li> <li>Authorship types: &ldquo;CGIAR ans advanced research institute&rdquo;, &ldquo;CGAIR and advanced research institute&rdquo;, &ldquo;CGIAR and advanced research institutes&rdquo;, &ldquo;CGAIR single center&rdquo;</li>
<li>Lots of inconsistencies and mispellings in author affiliations:</li> <li>Lots of inconsistencies and mispellings in author affiliations:
<ul>
<li>&ldquo;Institut des Recherches Agricoles du Bénin&rdquo; and &ldquo;Institut National des Recherche Agricoles du Benin&rdquo; and &ldquo;National Agricultural Research Institute, Benin&rdquo;</li> <li>&ldquo;Institut des Recherches Agricoles du Bénin&rdquo; and &ldquo;Institut National des Recherche Agricoles du Benin&rdquo; and &ldquo;National Agricultural Research Institute, Benin&rdquo;</li>
<li>International Insitute of Tropical Agriculture</li> <li>International Insitute of Tropical Agriculture</li>
<li>Centro Internacional de Agricultura Tropical</li> <li>Centro Internacional de Agricultura Tropical</li>
<li>&ldquo;Rivers State University of Science and Technology&rdquo; and &ldquo;Rivers State University&rdquo;</li> <li>&ldquo;Rivers State University of Science and Technology&rdquo; and &ldquo;Rivers State University&rdquo;</li>
<li>&ldquo;Institut de la Recherche Agronomique, Cameroon&rdquo; and &ldquo;Institut de Recherche Agronomique, Cameroon&rdquo;</li> <li>&ldquo;Institut de la Recherche Agronomique, Cameroon&rdquo; and &ldquo;Institut de Recherche Agronomique, Cameroon&rdquo;</li>
<li>Inconsistency in countries: &ldquo;COTE DIVOIRE&rdquo; and &ldquo;COTE D&rsquo;IVOIRE&rdquo;</li> </ul>
</li>
<li>Inconsistency in countries: &ldquo;COTE DIVOIRE&rdquo; and &ldquo;COTE D'IVOIRE&rdquo;</li>
<li>A few DOIs with spaces or invalid characters</li> <li>A few DOIs with spaces or invalid characters</li>
<li>Inconsistency in IITA subjects, for example &ldquo;PRODUCTION VEGETALE&rdquo; and &ldquo;PRODUCTION VÉGÉTALE&rdquo; and several others</li> <li>Inconsistency in IITA subjects, for example &ldquo;PRODUCTION VEGETALE&rdquo; and &ldquo;PRODUCTION VÉGÉTALE&rdquo; and several others</li>
<li>I ran <code>value.unescape('javascript')</code> on the abstract and citation fields because it looks like this data came from a SQL database and some stuff was escaped</li> <li>I ran <code>value.unescape('javascript')</code> on the abstract and citation fields because it looks like this data came from a SQL database and some stuff was escaped</li>
</ul></li> </ul>
<li>It turns out that Abenet actually did a lot of small corrections on this data so when Sisay uses Bosede&rsquo;s original file it doesn&rsquo;t have all those corrections</li> </li>
<li>So I told Sisay to re-create the collection using Abenet&rsquo;s XLS from last week (<code>Mercy1805_AY.xls</code>)</li> <li>It turns out that Abenet actually did a lot of small corrections on this data so when Sisay uses Bosede's original file it doesn't have all those corrections</li>
<li>So I told Sisay to re-create the collection using Abenet's XLS from last week (<code>Mercy1805_AY.xls</code>)</li>
<li>I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces</li> <li>I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces</li>
<li>I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: <code>isNotNull(value.match(/.*?\s{2,}.*?/))</code></li> <li>I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: <code>isNotNull(value.match(/.*?\s{2,}.*?/))</code></li>
<li>I wonder if I should start checking for &ldquo;smart&rdquo; quotes like (hex 2019)</li> <li>I wonder if I should start checking for &ldquo;smart&rdquo; quotes like (hex 2019)</li>
</ul> </ul>
<h2 id="20180612">2018-06-12</h2>
<h2 id="2018-06-12">2018-06-12</h2>
<ul> <ul>
<li>Udana from IWMI asked about the OAI base URL for their community on CGSpace</li> <li>Udana from IWMI asked about the OAI base URL for their community on CGSpace</li>
<li>I think it should be this: <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814">https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814</a></li> <li>I think it should be this: <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814">https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814</a></li>
<li>The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results</li> <li>The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results</li>
<li>Regarding Udana&rsquo;s Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I&rsquo;d check them after that</li> <li>Regarding Udana's Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I'd check them after that</li>
<li>The latest batch of IITA&rsquo;s 200 records (based on Abenet&rsquo;s version <code>Mercy1805_AY.xls</code>) are now in the <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection</li> <li>The latest batch of IITA's 200 records (based on Abenet's version <code>Mercy1805_AY.xls</code>) are now in the <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection</li>
<li>So here are some corrections:
<li><p>So here are some corrections:</p>
<ul> <ul>
<li>use of Unicode smart quote (hex 2019) in countries and affiliations, for example &ldquo;COTE DIVOIRE&rdquo; and &ldquo;Institut dEconomic Rurale, Mali&rdquo;</li> <li>use of Unicode smart quote (hex 2019) in countries and affiliations, for example &ldquo;COTE DIVOIRE&rdquo; and &ldquo;Institut dEconomic Rurale, Mali&rdquo;</li>
<li>inconsistencies in <code>cg.contributor.affiliation</code>:</li> <li>inconsistencies in <code>cg.contributor.affiliation</code>:
<ul>
<li>&ldquo;Centro Internacional de Agricultura Tropical&rdquo; and &ldquo;Centro International de Agricultura Tropical&rdquo; should use the English name of CIAT (International Center for Tropical Agriculture)</li> <li>&ldquo;Centro Internacional de Agricultura Tropical&rdquo; and &ldquo;Centro International de Agricultura Tropical&rdquo; should use the English name of CIAT (International Center for Tropical Agriculture)</li>
<li>&ldquo;Institut International d&rsquo;Agriculture Tropicale&rdquo; should use the English name of IITA (International Institute of Tropical Agriculture)</li> <li>&ldquo;Institut International d'Agriculture Tropicale&rdquo; should use the English name of IITA (International Institute of Tropical Agriculture)</li>
<li>&ldquo;East and Southern Africa Regional Center&rdquo; and &ldquo;Eastern and Southern Africa Regional Centre&rdquo;</li> <li>&ldquo;East and Southern Africa Regional Center&rdquo; and &ldquo;Eastern and Southern Africa Regional Centre&rdquo;</li>
<li>&ldquo;Institut de la Recherche Agronomique, Cameroon&rdquo; and &ldquo;Institut de Recherche Agronomique, Cameroon&rdquo;</li> <li>&ldquo;Institut de la Recherche Agronomique, Cameroon&rdquo; and &ldquo;Institut de Recherche Agronomique, Cameroon&rdquo;</li>
<li>&ldquo;Institut des Recherches Agricoles du Bénin&rdquo; and &ldquo;Institut National des Recherche Agricoles du Benin&rdquo; and &ldquo;National Agricultural Research Institute, Benin&rdquo;</li> <li>&ldquo;Institut des Recherches Agricoles du Bénin&rdquo; and &ldquo;Institut National des Recherche Agricoles du Benin&rdquo; and &ldquo;National Agricultural Research Institute, Benin&rdquo;</li>
<li>&ldquo;Institute of Agronomic Research, Cameroon&rdquo; and &ldquo;Institute of Agronomy Research, Cameroon&rdquo;</li> <li>&ldquo;Institute of Agronomic Research, Cameroon&rdquo; and &ldquo;Institute of Agronomy Research, Cameroon&rdquo;</li>
<li>&ldquo;Rivers State University&rdquo; and &ldquo;Rivers State University of Science and Technology&rdquo;</li> <li>&ldquo;Rivers State University&rdquo; and &ldquo;Rivers State University of Science and Technology&rdquo;</li>
<li>&ldquo;Universität Hannover&rdquo; and &ldquo;University of Hannover&rdquo;</li> <li>&ldquo;Universität Hannover&rdquo; and &ldquo;University of Hannover&rdquo;</li>
<li>inconsistencies in <code>cg.subject.iita</code>:</li> </ul>
</li>
<li>inconsistencies in <code>cg.subject.iita</code>:
<ul>
<li>&ldquo;AMELIORATION DES PLANTES&rdquo; and &ldquo;AMÉLIORATION DES PLANTES&rdquo;</li> <li>&ldquo;AMELIORATION DES PLANTES&rdquo; and &ldquo;AMÉLIORATION DES PLANTES&rdquo;</li>
<li>&ldquo;PRODUCTION VEGETALE&rdquo; and &ldquo;PRODUCTION VÉGÉTALE&rdquo;</li> <li>&ldquo;PRODUCTION VEGETALE&rdquo; and &ldquo;PRODUCTION VÉGÉTALE&rdquo;</li>
<li>&ldquo;CONTRÔLE DE MALADIES&rdquo; and &ldquo;CONTROLE DES MALADIES&rdquo;</li> <li>&ldquo;CONTRÔLE DE MALADIES&rdquo; and &ldquo;CONTROLE DES MALADIES&rdquo;</li>
@ -324,9 +296,15 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
<li>&ldquo;RAVAGEURS DE PLANTES&rdquo; and &ldquo;RAVAGEURS DES PLANTES&rdquo;</li> <li>&ldquo;RAVAGEURS DE PLANTES&rdquo; and &ldquo;RAVAGEURS DES PLANTES&rdquo;</li>
<li>&ldquo;SANTE DES PLANTES&rdquo; and &ldquo;SANTÉ DES PLANTES&rdquo;</li> <li>&ldquo;SANTE DES PLANTES&rdquo; and &ldquo;SANTÉ DES PLANTES&rdquo;</li>
<li>&ldquo;SOCIOECONOMIE&rdquo; and &ldquo;SOCIOECONOMY&rdquo;</li> <li>&ldquo;SOCIOECONOMIE&rdquo; and &ldquo;SOCIOECONOMY&rdquo;</li>
<li>inconsistencies in <code>dc.description.sponsorship</code>:</li> </ul>
</li>
<li>inconsistencies in <code>dc.description.sponsorship</code>:
<ul>
<li>&ldquo;Belgian Corporation&rdquo; and &ldquo;Belgium Corporation&rdquo;</li> <li>&ldquo;Belgian Corporation&rdquo; and &ldquo;Belgium Corporation&rdquo;</li>
<li>inconsistencies in <code>dc.subject</code>:</li> </ul>
</li>
<li>inconsistencies in <code>dc.subject</code>:
<ul>
<li>&ldquo;AFRICAN CASSAVA MOSAIC&rdquo; and &ldquo;AFRICAN CASSAVA MOSAIC DISEASE&rdquo;</li> <li>&ldquo;AFRICAN CASSAVA MOSAIC&rdquo; and &ldquo;AFRICAN CASSAVA MOSAIC DISEASE&rdquo;</li>
<li>&ldquo;ASPERGILLU FLAVUS&rdquo; and &ldquo;ASPERGILLUS FLAVUS&rdquo;</li> <li>&ldquo;ASPERGILLU FLAVUS&rdquo; and &ldquo;ASPERGILLUS FLAVUS&rdquo;</li>
<li>&ldquo;BIOTECHNOLOGIES&rdquo; and &ldquo;BIOTECHNOLOGY&rdquo;</li> <li>&ldquo;BIOTECHNOLOGIES&rdquo; and &ldquo;BIOTECHNOLOGY&rdquo;</li>
@ -339,117 +317,90 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
<li>&ldquo;LEGUMINOSAE&rdquo; and &ldquo;LEGUMINOUS&rdquo;</li> <li>&ldquo;LEGUMINOSAE&rdquo; and &ldquo;LEGUMINOUS&rdquo;</li>
<li>&ldquo;LEGUMINOUS COVER CROP&rdquo; and &ldquo;LEGUMINOUS COVER CROPS&rdquo;</li> <li>&ldquo;LEGUMINOUS COVER CROP&rdquo; and &ldquo;LEGUMINOUS COVER CROPS&rdquo;</li>
<li>&ldquo;MATÉRIEL DE PLANTATION&rdquo; and &ldquo;MATÉRIELS DE PLANTATION&rdquo;</li> <li>&ldquo;MATÉRIEL DE PLANTATION&rdquo; and &ldquo;MATÉRIELS DE PLANTATION&rdquo;</li>
<li>I noticed that some records do have encoding errors in the <code>dc.description.abstract</code> field, but only four of them so probably not from Abenet&rsquo;s handling of the XLS file</li>
<li><p>Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:</p>
<pre><code>or(
value.contains('€'),
value.contains('6g'),
value.contains('6m'),
value.contains('6d'),
value.contains('6e')
)
</code></pre></li>
<li><p>So IITA should double check the abstracts for these:</p></li>
<li><p><a href="https://dspacetest.cgiar.org/10568/96184">https://dspacetest.cgiar.org/10568/96184</a></p></li>
<li><p><a href="https://dspacetest.cgiar.org/10568/96141">https://dspacetest.cgiar.org/10568/96141</a></p></li>
<li><p><a href="https://dspacetest.cgiar.org/10568/96118">https://dspacetest.cgiar.org/10568/96118</a></p></li>
<li><p><a href="https://dspacetest.cgiar.org/10568/96113">https://dspacetest.cgiar.org/10568/96113</a></p></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-06-13">2018-06-13</h2> <li>I noticed that some records do have encoding errors in the <code>dc.description.abstract</code> field, but only four of them so probably not from Abenet's handling of the XLS file</li>
<li>Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:</li>
</ul>
</li>
</ul>
<pre><code>or(
value.contains('€'),
value.contains('6g'),
value.contains('6m'),
value.contains('6d'),
value.contains('6e')
)
</code></pre><ul>
<li>So IITA should double check the abstracts for these:
<ul> <ul>
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Robin Buruchara&rsquo;s items</li> <li><a href="https://dspacetest.cgiar.org/10568/96184">https://dspacetest.cgiar.org/10568/96184</a></li>
<li><a href="https://dspacetest.cgiar.org/10568/96141">https://dspacetest.cgiar.org/10568/96141</a></li>
<li><p>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</p> <li><a href="https://dspacetest.cgiar.org/10568/96118">https://dspacetest.cgiar.org/10568/96118</a></li>
<li><a href="https://dspacetest.cgiar.org/10568/96113">https://dspacetest.cgiar.org/10568/96113</a></li>
</ul>
</li>
</ul>
<h2 id="20180613">2018-06-13</h2>
<ul>
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Robin Buruchara's items</li>
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
</ul>
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-13-Robin-Buruchara.csv -db dspace -u dspace -p 'fuuu' <pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-13-Robin-Buruchara.csv -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>The contents of <code>2018-06-13-Robin-Buruchara.csv</code> were:</li>
<li><p>The contents of <code>2018-06-13-Robin-Buruchara.csv</code> were:</p> </ul>
<pre><code>dc.contributor.author,cg.creator.id <pre><code>dc.contributor.author,cg.creator.id
&quot;Buruchara, Robin&quot;,Robin Buruchara: 0000-0003-0934-1218 &quot;Buruchara, Robin&quot;,Robin Buruchara: 0000-0003-0934-1218
&quot;Buruchara, Robin A.&quot;,Robin Buruchara: 0000-0003-0934-1218 &quot;Buruchara, Robin A.&quot;,Robin Buruchara: 0000-0003-0934-1218
</code></pre></li> </code></pre><ul>
<li>On a hunch I checked to see if CGSpace's bitstream cleanup was working properly and of course it's broken:</li>
<li><p>On a hunch I checked to see if CGSpace&rsquo;s bitstream cleanup was working properly and of course it&rsquo;s broken:</p> </ul>
<pre><code>$ dspace cleanup -v <pre><code>$ dspace cleanup -v
... ...
Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(152402) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(152402) is still referenced from table &quot;bundle&quot;.
</code></pre></li> </code></pre><ul>
<li>As always, the solution is to delete that ID manually in PostgreSQL:</li>
<li><p>As always, the solution is to delete that ID manually in PostgreSQL:</p> </ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);' <pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);'
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><h2 id="20180614">2018-06-14</h2>
</ul>
<h2 id="2018-06-14">2018-06-14</h2>
<ul> <ul>
<li>Check through Udana&rsquo;s IWMI records from last week on DSpace Test</li> <li>Check through Udana's IWMI records from last week on DSpace Test</li>
<li>There were only some minor whitespace and one or two syntax errors, but they look very good otherwise</li> <li>There were only some minor whitespace and one or two syntax errors, but they look very good otherwise</li>
<li>I uploaded the twenty-four reports to the IWMI Reports collection: <a href="https://cgspace.cgiar.org/handle/10568/36188">https://cgspace.cgiar.org/handle/10568/36188</a></li> <li>I uploaded the twenty-four reports to the IWMI Reports collection: <a href="https://cgspace.cgiar.org/handle/10568/36188">https://cgspace.cgiar.org/handle/10568/36188</a></li>
<li>I uploaded the seventy-six book chapters to the IWMI Book Chapters collection: <a href="https://cgspace.cgiar.org/handle/10568/36178">https://cgspace.cgiar.org/handle/10568/36178</a></li> <li>I uploaded the seventy-six book chapters to the IWMI Book Chapters collection: <a href="https://cgspace.cgiar.org/handle/10568/36178">https://cgspace.cgiar.org/handle/10568/36178</a></li>
</ul> </ul>
<h2 id="20180624">2018-06-24</h2>
<h2 id="2018-06-24">2018-06-24</h2>
<ul> <ul>
<li><p>I was restoring a PostgreSQL dump on my test machine and found a way to restore the CGSpace dump as the <code>postgres</code> user, but have the owner of the schema be the <code>dspacetest</code> user:</p> <li>I was restoring a PostgreSQL dump on my test machine and found a way to restore the CGSpace dump as the <code>postgres</code> user, but have the owner of the schema be the <code>dspacetest</code> user:</li>
</ul>
<pre><code>$ dropdb -h localhost -U postgres dspacetest <pre><code>$ dropdb -h localhost -U postgres dspacetest
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest $ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost /tmp/cgspace_2018-06-24.backup $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost /tmp/cgspace_2018-06-24.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
</code></pre></li> </code></pre><ul>
<li>The <code>-O</code> option to <code>pg_restore</code> makes the import process ignore ownership specified in the dump itself, and instead makes the schema owned by the user doing the restore</li>
<li><p>The <code>-O</code> option to <code>pg_restore</code> makes the import process ignore ownership specified in the dump itself, and instead makes the schema owned by the user doing the restore</p></li> <li>I always prefer to use the <code>postgres</code> user locally because it's just easier than remembering the <code>dspacetest</code> user's password, but then I couldn't figure out why the resulting schema was owned by <code>postgres</code></li>
<li>So with this you connect as the <code>postgres</code> superuser and then switch roles to <code>dspacetest</code> (also, make sure this user has <code>superuser</code> privileges before the restore)</li>
<li><p>I always prefer to use the <code>postgres</code> user locally because it&rsquo;s just easier than remembering the <code>dspacetest</code> user&rsquo;s password, but then I couldn&rsquo;t figure out why the resulting schema was owned by <code>postgres</code></p></li> <li>Last week Linode emailed me to say that our Linode 8192 instance used for DSpace Test qualified for an upgrade</li>
<li>Apparently they announced some <a href="https://blog.linode.com/2018/05/17/updated-linode-plans-new-larger-linodes/">upgrades to most of their plans in 2018-05</a></li>
<li><p>So with this you connect as the <code>postgres</code> superuser and then switch roles to <code>dspacetest</code> (also, make sure this user has <code>superuser</code> privileges before the restore)</p></li> <li>After the upgrade I see we have more disk space available in the instance's dashboard, so I shut the instance down and resized it from 98GB to 160GB</li>
<li>The resize was very quick (less than one minute) and after booting the instance back up I now have 160GB for the root filesystem!</li>
<li><p>Last week Linode emailed me to say that our Linode 8192 instance used for DSpace Test qualified for an upgrade</p></li> <li>I will move the DSpace installation directory back to the root file system and delete the extra 300GB block storage, as it was actually kinda slow when we put Solr there and now we don't actually need it anymore because running the production Solr on this instance didn't work well with 8GB of RAM</li>
<li>Also, the larger instance we're using for CGSpace will go from 24GB of RAM to 32, and will also get a storage increase from 320GB to 640GB&hellip; that means we don't need to consider using block storage right now!</li>
<li><p>Apparently they announced some <a href="https://blog.linode.com/2018/05/17/updated-linode-plans-new-larger-linodes/">upgrades to most of their plans in 2018-05</a></p></li> <li>The smaller instances get increased storage and network speed but I doubt many are actually using much of their current allocations so we probably don't need to bother with upgrading them</li>
<li>Last week Abenet asked if we could add <code>dc.language.iso</code> to the advanced search filters</li>
<li><p>After the upgrade I see we have more disk space available in the instance&rsquo;s dashboard, so I shut the instance down and resized it from 98GB to 160GB</p></li> <li>There is already a search filter for this field defined in <code>discovery.xml</code> but we aren't using it, so I quickly enabled and tested it, then merged it to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/380">#380</a>)</li>
<li>Back to testing the DSpace 5.8 changes from Atmire, I had another issue with SQL migrations:</li>
<li><p>The resize was very quick (less than one minute) and after booting the instance back up I now have 160GB for the root filesystem!</p></li> </ul>
<li><p>I will move the DSpace installation directory back to the root file system and delete the extra 300GB block storage, as it was actually kinda slow when we put Solr there and now we don&rsquo;t actually need it anymore because running the production Solr on this instance didn&rsquo;t work well with 8GB of RAM</p></li>
<li><p>Also, the larger instance we&rsquo;re using for CGSpace will go from 24GB of RAM to 32, and will also get a storage increase from 320GB to 640GB&hellip; that means we don&rsquo;t need to consider using block storage right now!</p></li>
<li><p>The smaller instances get increased storage and network speed but I doubt many are actually using much of their current allocations so we probably don&rsquo;t need to bother with upgrading them</p></li>
<li><p>Last week Abenet asked if we could add <code>dc.language.iso</code> to the advanced search filters</p></li>
<li><p>There is already a search filter for this field defined in <code>discovery.xml</code> but we aren&rsquo;t using it, so I quickly enabled and tested it, then merged it to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/380">#380</a>)</p></li>
<li><p>Back to testing the DSpace 5.8 changes from Atmire, I had another issue with SQL migrations:</p>
<pre><code>Caused by: org.flywaydb.core.api.FlywayException: Validate failed. Found differences between applied migrations and available migrations: Detected applied migration missing on the classpath: 5.8.2015.12.03.3 <pre><code>Caused by: org.flywaydb.core.api.FlywayException: Validate failed. Found differences between applied migrations and available migrations: Detected applied migration missing on the classpath: 5.8.2015.12.03.3
</code></pre></li> </code></pre><ul>
<li>It took me a while to figure out that this migration is for MQM, which I removed after Atmire's original advice about the migrations so we actually need to delete this migration instead up updating it</li>
<li><p>It took me a while to figure out that this migration is for MQM, which I removed after Atmire&rsquo;s original advice about the migrations so we actually need to delete this migration instead up updating it</p></li> <li>So I need to make sure to run the following during the DSpace 5.8 upgrade:</li>
</ul>
<li><p>So I need to make sure to run the following during the DSpace 5.8 upgrade:</p>
<pre><code>-- Delete existing CUA 4 migration if it exists <pre><code>-- Delete existing CUA 4 migration if it exists
delete from schema_version where version = '5.6.2015.12.03.2'; delete from schema_version where version = '5.6.2015.12.03.2';
@ -458,55 +409,41 @@ update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015
-- Delete MQM migration since we're no longer using it -- Delete MQM migration since we're no longer using it
delete from schema_version where version = '5.5.2015.12.03.3'; delete from schema_version where version = '5.5.2015.12.03.3';
</code></pre></li> </code></pre><ul>
<li>After that you can run the migrations manually and then DSpace should work fine:</li>
<li><p>After that you can run the migrations manually and then DSpace should work fine:</p> </ul>
<pre><code>$ ~/dspace/bin/dspace database migrate ignored <pre><code>$ ~/dspace/bin/dspace database migrate ignored
... ...
Done. Done.
</code></pre></li> </code></pre><ul>
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Andy Jarvis&rsquo; items on CGSpace</li>
<li><p>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Andy Jarvis&rsquo; items on CGSpace</p></li> <li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
</ul>
<li><p>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</p>
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-24-andy-jarvis-orcid.csv -db dspacetest -u dspacetest -p 'fuuu' <pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-24-andy-jarvis-orcid.csv -db dspacetest -u dspacetest -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>The contents of <code>2018-06-24-andy-jarvis-orcid.csv</code> were:</li>
<li><p>The contents of <code>2018-06-24-andy-jarvis-orcid.csv</code> were:</p> </ul>
<pre><code>dc.contributor.author,cg.creator.id <pre><code>dc.contributor.author,cg.creator.id
&quot;Jarvis, A.&quot;,Andy Jarvis: 0000-0001-6543-0798 &quot;Jarvis, A.&quot;,Andy Jarvis: 0000-0001-6543-0798
&quot;Jarvis, Andy&quot;,Andy Jarvis: 0000-0001-6543-0798 &quot;Jarvis, Andy&quot;,Andy Jarvis: 0000-0001-6543-0798
&quot;Jarvis, Andrew&quot;,Andy Jarvis: 0000-0001-6543-0798 &quot;Jarvis, Andrew&quot;,Andy Jarvis: 0000-0001-6543-0798
</code></pre></li> </code></pre><h2 id="20180626">2018-06-26</h2>
</ul>
<h2 id="2018-06-26">2018-06-26</h2>
<ul> <ul>
<li>Atmire got back to me to say that we can remove the <code>itemCollectionPlugin</code> and <code>HasBitstreamsSSIPlugin</code> beans from DSpace&rsquo;s <code>discovery.xml</code> file, as they are used by the Metadata Quality Module (MQM) that we are not using anymore</li> <li>Atmire got back to me to say that we can remove the <code>itemCollectionPlugin</code> and <code>HasBitstreamsSSIPlugin</code> beans from DSpace's <code>discovery.xml</code> file, as they are used by the Metadata Quality Module (MQM) that we are not using anymore</li>
<li>I removed both those beans and did some simple tests to check item submission, media-filter of PDFs, REST API, but got an error &ldquo;No matches for the query&rdquo; when listing records in OAI</li> <li>I removed both those beans and did some simple tests to check item submission, media-filter of PDFs, REST API, but got an error &ldquo;No matches for the query&rdquo; when listing records in OAI</li>
<li>This warning appears in the DSpace log:</li>
<li><p>This warning appears in the DSpace log:</p>
<pre><code>2018-06-26 16:58:12,052 WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
</code></pre></li>
<li><p>It&rsquo;s actually only a warning and it also appears in the logs on DSpace Test (which is currently running DSpace 5.5), so I need to keep troubleshooting</p></li>
<li><p>Ah, I think I just need to run <code>dspace oai import</code></p></li>
</ul> </ul>
<pre><code>2018-06-26 16:58:12,052 WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
<h2 id="2018-06-27">2018-06-27</h2> </code></pre><ul>
<li>It's actually only a warning and it also appears in the logs on DSpace Test (which is currently running DSpace 5.5), so I need to keep troubleshooting</li>
<li>Ah, I think I just need to run <code>dspace oai import</code></li>
</ul>
<h2 id="20180627">2018-06-27</h2>
<ul> <ul>
<li>Vika from CIFOR sent back his annotations on the duplicates for the &ldquo;CIFOR_May_9&rdquo; archive import that I sent him last week</li> <li>Vika from CIFOR sent back his annotations on the duplicates for the &ldquo;CIFOR_May_9&rdquo; archive import that I sent him last week</li>
<li>I&rsquo;ll have to figure out how to separate those we&rsquo;re keeping, deleting, and mapping into CIFOR&rsquo;s archive collection</li> <li>I'll have to figure out how to separate those we're keeping, deleting, and mapping into CIFOR's archive collection</li>
<li>First, get the 62 deletes from Vika's file and remove them from the collection:</li>
<li><p>First, get the 62 deletes from Vika&rsquo;s file and remove them from the collection:</p> </ul>
<pre><code>$ grep delete 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' &gt; cifor-handle-to-delete.txt <pre><code>$ grep delete 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' &gt; cifor-handle-to-delete.txt
$ wc -l cifor-handle-to-delete.txt $ wc -l cifor-handle-to-delete.txt
62 cifor-handle-to-delete.txt 62 cifor-handle-to-delete.txt
@ -515,56 +452,40 @@ $ wc -l 10568-92904.csv
$ while read line; do sed -i &quot;\#$line#d&quot; 10568-92904.csv; done &lt; cifor-handle-to-delete.txt $ while read line; do sed -i &quot;\#$line#d&quot; 10568-92904.csv; done &lt; cifor-handle-to-delete.txt
$ wc -l 10568-92904.csv $ wc -l 10568-92904.csv
2399 10568-92904.csv 2399 10568-92904.csv
</code></pre></li> </code></pre><ul>
<li>This iterates over the handles for deletion and uses <code>sed</code> with an alternative pattern delimiter of &lsquo;#&rsquo; (which must be escaped), because the pattern itself contains a &lsquo;/&rsquo;</li>
<li><p>This iterates over the handles for deletion and uses <code>sed</code> with an alternative pattern delimiter of &lsquo;#&rsquo; (which must be escaped), because the pattern itself contains a &lsquo;/&rsquo;</p></li> <li>The mapped ones will be difficult because we need their internal IDs in order to map them, and there are 50 of them:</li>
</ul>
<li><p>The mapped ones will be difficult because we need their internal IDs in order to map them, and there are 50 of them:</p>
<pre><code>$ grep map 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' &gt; cifor-handle-to-map.txt <pre><code>$ grep map 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' &gt; cifor-handle-to-map.txt
$ wc -l cifor-handle-to-map.txt $ wc -l cifor-handle-to-map.txt
50 cifor-handle-to-map.txt 50 cifor-handle-to-map.txt
</code></pre></li> </code></pre><ul>
<li>I can either get them from the databse, or programatically export the metadata using <code>dspace metadata-export -i 10568/xxxxx</code>&hellip;</li>
<li><p>I can either get them from the databse, or programatically export the metadata using <code>dspace metadata-export -i 10568/xxxxx</code>&hellip;</p></li> <li>Oooh, I can export the items one by one, concatenate them together, remove the headers, and extract the <code>id</code> and <code>collection</code> columns using <a href="https://csvkit.readthedocs.io/">csvkit</a>:</li>
</ul>
<li><p>Oooh, I can export the items one by one, concatenate them together, remove the headers, and extract the <code>id</code> and <code>collection</code> columns using <a href="https://csvkit.readthedocs.io/">csvkit</a>:</p>
<pre><code>$ while read line; do filename=${line/\//-}.csv; dspace metadata-export -i $line -f $filename; done &lt; /tmp/cifor-handle-to-map.txt <pre><code>$ while read line; do filename=${line/\//-}.csv; dspace metadata-export -i $line -f $filename; done &lt; /tmp/cifor-handle-to-map.txt
$ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 &gt; map-to-cifor-archive.csv $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 &gt; map-to-cifor-archive.csv
</code></pre></li> </code></pre><ul>
<li>Then I can use Open Refine to add the &ldquo;CIFOR Archive&rdquo; collection to the mappings</li>
<li><p>Then I can use Open Refine to add the &ldquo;CIFOR Archive&rdquo; collection to the mappings</p></li> <li>Importing the 2398 items via <code>dspace metadata-import</code> ends up with a Java garbage collection error, so I think I need to do it in batches of 1,000</li>
<li>After deleting the 62 duplicates, mapping the 50 items from elsewhere in CGSpace, and uploading 2,398 unique items, there are a total of 2,448 items added in this batch</li>
<li><p>Importing the 2398 items via <code>dspace metadata-import</code> ends up with a Java garbage collection error, so I think I need to do it in batches of 1,000</p></li> <li>I'll let Abenet take one last look and then move them to CGSpace</li>
<li><p>After deleting the 62 duplicates, mapping the 50 items from elsewhere in CGSpace, and uploading 2,398 unique items, there are a total of 2,448 items added in this batch</p></li>
<li><p>I&rsquo;ll let Abenet take one last look and then move them to CGSpace</p></li>
</ul> </ul>
<h2 id="20180628">2018-06-28</h2>
<h2 id="2018-06-28">2018-06-28</h2>
<ul> <ul>
<li>DSpace Test appears to have crashed last night</li> <li>DSpace Test appears to have crashed last night</li>
<li>There is nothing in the Tomcat or DSpace logs, but I see the following in <code>dmesg -T</code>:</li>
<li><p>There is nothing in the Tomcat or DSpace logs, but I see the following in <code>dmesg -T</code>:</p> </ul>
<pre><code>[Thu Jun 28 00:00:30 2018] Out of memory: Kill process 14501 (java) score 701 or sacrifice child <pre><code>[Thu Jun 28 00:00:30 2018] Out of memory: Kill process 14501 (java) score 701 or sacrifice child
[Thu Jun 28 00:00:30 2018] Killed process 14501 (java) total-vm:14926704kB, anon-rss:5693608kB, file-rss:0kB, shmem-rss:0kB [Thu Jun 28 00:00:30 2018] Killed process 14501 (java) total-vm:14926704kB, anon-rss:5693608kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 28 00:00:30 2018] oom_reaper: reaped process 14501 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Thu Jun 28 00:00:30 2018] oom_reaper: reaped process 14501 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Look over IITA's <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection from earlier this month on DSpace Test</li>
<li><p>Look over IITA&rsquo;s <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection from earlier this month on DSpace Test</p></li> <li>Bosede fixed a few things (and seems to have removed many French IITA subjects like <code>AMÉLIORATION DES PLANTES</code> and <code>SANTÉ DES PLANTES</code>)</li>
<li>I still see at least one issue with author affiliations, and I didn't bother to check the AGROVOC subjects because it's such a mess aanyways</li>
<li><p>Bosede fixed a few things (and seems to have removed many French IITA subjects like <code>AMÉLIORATION DES PLANTES</code> and <code>SANTÉ DES PLANTES</code>)</p></li> <li>I suggested that IITA provide an updated list of subject to us so we can include their controlled vocabulary in CGSpace, which would also make it easier to do automated validation</li>
<li><p>I still see at least one issue with author affiliations, and I didn&rsquo;t bother to check the AGROVOC subjects because it&rsquo;s such a mess aanyways</p></li>
<li><p>I suggested that IITA provide an updated list of subject to us so we can include their controlled vocabulary in CGSpace, which would also make it easier to do automated validation</p></li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,16 +8,13 @@
<meta property="og:title" content="July, 2018" /> <meta property="og:title" content="July, 2018" />
<meta property="og:description" content="2018-07-01 <meta property="og:description" content="2018-07-01
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case: I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace $ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory: During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue. There is insufficient memory for the Java Runtime Environment to continue.
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-07/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-07/" />
@ -28,18 +25,15 @@ There is insufficient memory for the Java Runtime Environment to continue.
<meta name="twitter:title" content="July, 2018"/> <meta name="twitter:title" content="July, 2018"/>
<meta name="twitter:description" content="2018-07-01 <meta name="twitter:description" content="2018-07-01
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case: I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace $ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory: During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue. There is insufficient memory for the Java Runtime Environment to continue.
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -120,29 +114,23 @@ There is insufficient memory for the Java Runtime Environment to continue.
</p> </p>
</header> </header>
<h2 id="2018-07-01">2018-07-01</h2> <h2 id="20180701">2018-07-01</h2>
<ul> <ul>
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p> <li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace <pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre></li> </code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p> </ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
<pre><code>There is insufficient memory for the Java Runtime Environment to continue. </code></pre><ul>
</code></pre></li> <li>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</li>
</ul> </ul>
<ul>
<li><p>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</p>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package $ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
</code></pre></li> </code></pre><ul>
<li>Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:</li>
<li><p>Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:</p> </ul>
<pre><code>$ sudo su - postgres <pre><code>$ sudo su - postgres
$ psql dspace $ psql dspace
... ...
@ -156,50 +144,42 @@ dspace=# commit
dspace=# \q dspace=# \q
$ exit $ exit
$ dspace database migrate ignored $ dspace database migrate ignored
</code></pre></li> </code></pre><ul>
<li>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</li>
<li><p>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</p></li>
</ul> </ul>
<h2 id="20180702">2018-07-02</h2>
<h2 id="2018-07-02">2018-07-02</h2>
<ul> <ul>
<li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li> <li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li>
<li>They seem to be only interested in Gates-funded outputs, for example: <a href="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li> <li>They seem to be only interested in Gates-funded outputs, for example: <a href="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li>
</ul> </ul>
<h2 id="20180703">2018-07-03</h2>
<h2 id="2018-07-03">2018-07-03</h2>
<ul> <ul>
<li><p>Finally finish with the CIFOR Archive records (a total of 2448):</p> <li>Finally finish with the CIFOR Archive records (a total of 2448):
<ul> <ul>
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <a href="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li> <li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <a href="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li>
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li> <li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li> <li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
<li>Then I imported those 2398 items in two batches (to deal with memory issues):</li>
<li><p>Then I imported those 2398 items in two batches (to deal with memory issues):</p> </ul>
</li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv $ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv $ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
</code></pre></li> </code></pre><ul>
</ul></li> <li>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</li>
</ul>
<li><p>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</p>
<pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%'; <pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
count count
------- -------
785 785
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*'; dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
count count
------- -------
4 4
</code></pre></li> </code></pre><ul>
<li>I think I should fix that as well as some other garbage values like &ldquo;test&rdquo; and &ldquo;dspace.ilri.org&rdquo; etc:</li>
<li><p>I think I should fix that as well as some other garbage values like &ldquo;test&rdquo; and &ldquo;dspace.ilri.org&rdquo; etc:</p> </ul>
<pre><code>dspace=# begin; <pre><code>dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%'; dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785 UPDATE 785
@ -210,12 +190,11 @@ UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403); dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
DELETE 4 DELETE 4
dspace=# commit; dspace=# commit;
</code></pre></li> </code></pre><ul>
<li>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</li>
<li><p>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</p> </ul>
<pre><code>03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener] <pre><code>03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)} java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92) at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4792) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4792)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5256) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5256)
@ -231,269 +210,208 @@ java.lang.RuntimeException: Failure during filter init: Failed to startup the DS
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)} Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
</code></pre></li> </code></pre><ul>
<li>Gotta check that out later&hellip;</li>
<li><p>Gotta check that out later&hellip;</p></li>
</ul> </ul>
<h2 id="20180704">2018-07-04</h2>
<h2 id="2018-07-04">2018-07-04</h2>
<ul> <ul>
<li>I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7</li> <li>I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7</li>
<li>I have raised this in the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 compatibility ticket on Atmire&rsquo;s tracker</a></li> <li>I have raised this in the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 compatibility ticket on Atmire's tracker</a></li>
<li>Abenet wants me to add &ldquo;United Kingdom government&rdquo; to the sponsors on CGSpace so I created a ticket to track it (<a href="https://github.com/ilri/DSpace/issues/381">#381</a>)</li> <li>Abenet wants me to add &ldquo;United Kingdom government&rdquo; to the sponsors on CGSpace so I created a ticket to track it (<a href="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Also, Udana wants me to add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to the WLE Phase II research themes so I created a ticket to track that (<a href="https://github.com/ilri/DSpace/issues/382">#382</a>)</li> <li>Also, Udana wants me to add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to the WLE Phase II research themes so I created a ticket to track that (<a href="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
<li>I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!</li> <li>I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!</li>
</ul> </ul>
<h2 id="20180706">2018-07-06</h2>
<h2 id="2018-07-06">2018-07-06</h2>
<ul> <ul>
<li>CCAFS want me to add &ldquo;PII-FP2_MSCCCAFS&rdquo; to their Phase II project tags on CGSpace (<a href="https://github.com/ilri/DSpace/issues/383">#383</a>)</li> <li>CCAFS want me to add &ldquo;PII-FP2_MSCCCAFS&rdquo; to their Phase II project tags on CGSpace (<a href="https://github.com/ilri/DSpace/issues/383">#383</a>)</li>
<li>I&rsquo;ll do it in a batch with all the other metadata updates next week</li> <li>I'll do it in a batch with all the other metadata updates next week</li>
</ul> </ul>
<h2 id="20180708">2018-07-08</h2>
<h2 id="2018-07-08">2018-07-08</h2>
<ul> <ul>
<li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn&rsquo;t being backed up to S3</li> <li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn't being backed up to S3</li>
<li>I apparently noticed this—and fixed it!—in <a href="/cgspace-notes/2016-07/">2016-07</a>, but it doesn&rsquo;t look like the backup has been updated since then!</li> <li>I apparently noticed this—and fixed it!—in <a href="/cgspace-notes/2016-07/">2016-07</a>, but it doesn't look like the backup has been updated since then!</li>
<li>It looks like I added Solr to the <code>backup_to_s3.sh</code> script, but that script is not even being used (<code>s3cmd</code> is run directly from root&rsquo;s crontab)</li> <li>It looks like I added Solr to the <code>backup_to_s3.sh</code> script, but that script is not even being used (<code>s3cmd</code> is run directly from root's crontab)</li>
<li>For now I have just initiated a manual S3 backup of the Solr data:</li>
<li><p>For now I have just initiated a manual S3 backup of the Solr data:</p> </ul>
<pre><code># s3cmd sync --delete-removed /home/backup/solr/ s3://cgspace.cgiar.org/solr/ <pre><code># s3cmd sync --delete-removed /home/backup/solr/ s3://cgspace.cgiar.org/solr/
</code></pre></li> </code></pre><ul>
<li>But I need to add this to cron!</li>
<li><p>But I need to add this to cron!</p></li> <li>I wonder if I should convert some of the cron jobs to systemd services / timers&hellip;</li>
<li>I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th</li>
<li><p>I wonder if I should convert some of the cron jobs to systemd services / timers&hellip;</p></li> <li>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<a href="https://github.com/ilri/DSpace/issues/384">#384</a>)</li>
<li>I regenerated the list of names for all our ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
<li><p>I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th</p></li> </ul>
<li><p>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<a href="https://github.com/ilri/DSpace/issues/384">#384</a>)</p></li>
<li><p>I regenerated the list of names for all our ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
<pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq &gt; /tmp/2018-07-08-orcids.txt <pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq &gt; /tmp/2018-07-08-orcids.txt
$ ./resolve-orcids.py -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt -d $ ./resolve-orcids.py -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt -d
</code></pre></li> </code></pre><ul>
<li>But after comparing to the existing list of names I didn't see much change, so I just ignored it</li>
<li><p>But after comparing to the existing list of names I didn&rsquo;t see much change, so I just ignored it</p></li>
</ul> </ul>
<h2 id="20180709">2018-07-09</h2>
<h2 id="2018-07-09">2018-07-09</h2>
<ul> <ul>
<li>Uptime Robot said that CGSpace was down for two minutes early this morning but I don&rsquo;t see anything in Tomcat logs or dmesg</li> <li>Uptime Robot said that CGSpace was down for two minutes early this morning but I don't see anything in Tomcat logs or dmesg</li>
<li>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat's <code>catalina.out</code>:</li>
<li><p>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat&rsquo;s <code>catalina.out</code>:</p> </ul>
<pre><code>Exception in thread &quot;http-bio-127.0.0.1-8081-exec-557&quot; java.lang.OutOfMemoryError: Java heap space <pre><code>Exception in thread &quot;http-bio-127.0.0.1-8081-exec-557&quot; java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>I'm not sure if it's the same error, but I see this in DSpace's <code>solr.log</code>:</li>
<li><p>I&rsquo;m not sure if it&rsquo;s the same error, but I see this in DSpace&rsquo;s <code>solr.log</code>:</p> </ul>
<pre><code>2018-07-09 06:25:09,913 ERROR org.apache.solr.servlet.SolrDispatchFilter @ null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space <pre><code>2018-07-09 06:25:09,913 ERROR org.apache.solr.servlet.SolrDispatchFilter @ null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
</code></pre></li> </code></pre><ul>
<li>I see a strange error around that time in <code>dspace.log.2018-07-08</code>:</li>
<li><p>I see a strange error around that time in <code>dspace.log.2018-07-08</code>:</p> </ul>
<pre><code>2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics <pre><code>2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics
</code></pre></li> </code></pre><ul>
<li>But not sure what caused that&hellip;</li>
<li><p>But not sure what caused that&hellip;</p></li> <li>I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT</li>
<li>Looking in the nginx logs I see the top ten IP addresses active today:</li>
<li><p>I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT</p></li> </ul>
<li><p>Looking in the nginx logs I see the top ten IP addresses active today:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;09/Jul/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;09/Jul/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1691 40.77.167.84 1691 40.77.167.84
1701 40.77.167.69 1701 40.77.167.69
1718 50.116.102.77 1718 50.116.102.77
1872 137.108.70.6 1872 137.108.70.6
2172 157.55.39.234 2172 157.55.39.234
2190 207.46.13.47 2190 207.46.13.47
2848 178.154.200.38 2848 178.154.200.38
4367 35.227.26.162 4367 35.227.26.162
4387 70.32.83.92 4387 70.32.83.92
4738 95.108.181.88 4738 95.108.181.88
</code></pre></li> </code></pre><ul>
<li>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</li>
<li><p>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09
4435 4435
</code></pre></li> </code></pre><ul>
<li><code>95.108.181.88</code> appears to be Yandex, so I dunno why it's creating so many sessions, as its user agent should match Tomcat's Crawler Session Manager Valve</li>
<li><p><code>95.108.181.88</code> appears to be Yandex, so I dunno why it&rsquo;s creating so many sessions, as its user agent should match Tomcat&rsquo;s Crawler Session Manager Valve</p></li> <li><code>70.32.83.92</code> is on MediaTemple but I'm not sure who it is. They are mostly hitting REST so I guess that's fine</li>
<li><code>35.227.26.162</code> doesn't declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</li>
<li><p><code>70.32.83.92</code> is on MediaTemple but I&rsquo;m not sure who it is. They are mostly hitting REST so I guess that&rsquo;s fine</p></li> <li><code>178.154.200.38</code> is Yandex again</li>
<li><code>207.46.13.47</code> is Bing</li>
<li><p><code>35.227.26.162</code> doesn&rsquo;t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</p></li> <li><code>157.55.39.234</code> is Bing</li>
<li><code>137.108.70.6</code> is our old friend CORE bot</li>
<li><p><code>178.154.200.38</code> is Yandex again</p></li> <li><code>50.116.102.77</code> doesn't declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that's fine</li>
<li><code>40.77.167.84</code> is Bing again</li>
<li><p><code>207.46.13.47</code> is Bing</p></li> <li>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</li>
<li>I've added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</li>
<li><p><code>157.55.39.234</code> is Bing</p></li>
<li><p><code>137.108.70.6</code> is our old friend CORE bot</p></li>
<li><p><code>50.116.102.77</code> doesn&rsquo;t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that&rsquo;s fine</p></li>
<li><p><code>40.77.167.84</code> is Bing again</p></li>
<li><p>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</p></li>
<li><p>I&rsquo;ve added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</p></li>
</ul> </ul>
<h2 id="20180710">2018-07-10</h2>
<h2 id="2018-07-10">2018-07-10</h2>
<ul> <ul>
<li>Add &ldquo;United Kingdom government&rdquo; to sponsors (<a href="https://github.com/ilri/DSpace/issues/381">#381</a>)</li> <li>Add &ldquo;United Kingdom government&rdquo; to sponsors (<a href="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to WLE Phase II Research Themes (<a href="https://github.com/ilri/DSpace/issues/382">#382</a>)</li> <li>Add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to WLE Phase II Research Themes (<a href="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
<li>Add &ldquo;PII-FP2_MSCCCAFS&rdquo; to CCAFS Phase II Project Tags (<a href="https://github.com/ilri/DSpace/issues/383">#383</a>)</li> <li>Add &ldquo;PII-FP2_MSCCCAFS&rdquo; to CCAFS Phase II Project Tags (<a href="https://github.com/ilri/DSpace/issues/383">#383</a>)</li>
<li>Add journal title (dc.source) to Discovery search filters (<a href="https://github.com/ilri/DSpace/issues/384">#384</a>)</li> <li>Add journal title (dc.source) to Discovery search filters (<a href="https://github.com/ilri/DSpace/issues/384">#384</a>)</li>
<li>All were tested and merged to the <code>5_x-prod</code> branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade</li> <li>All were tested and merged to the <code>5_x-prod</code> branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade</li>
<li>I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire&rsquo;s 5.8 pull request (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)</li> <li>I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire's 5.8 pull request (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)</li>
<li>Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC</li> <li>Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC</li>
<li>These are the top ten users in the last two hours:</li>
<li><p>These are the top ten users in the last two hours:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Jul/2018:(11|12|13)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
81 193.95.22.113
82 50.116.102.77
112 40.77.167.90
117 196.190.95.98
120 178.154.200.38
215 40.77.167.96
243 41.204.190.40
415 95.108.181.88
695 35.227.26.162
697 213.139.52.250
</code></pre></li>
<li><p>Looks like <code>213.139.52.250</code> is Moayad testing his new CGSpace vizualization thing:</p>
<pre><code>213.139.52.250 - - [10/Jul/2018:13:39:41 +0000] &quot;GET /bitstream/handle/10568/75668/dryad.png HTTP/2.0&quot; 200 53750 &quot;http://localhost:4200/&quot; &quot;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36&quot;
</code></pre></li>
<li><p>He said there was a bug that caused his app to request a bunch of invalid URLs</p></li>
<li><p>I&rsquo;ll have to keep and eye on this and see how their platform evolves</p></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Jul/2018:(11|12|13)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<h2 id="2018-07-11">2018-07-11</h2> 81 193.95.22.113
82 50.116.102.77
112 40.77.167.90
117 196.190.95.98
120 178.154.200.38
215 40.77.167.96
243 41.204.190.40
415 95.108.181.88
695 35.227.26.162
697 213.139.52.250
</code></pre><ul>
<li>Looks like <code>213.139.52.250</code> is Moayad testing his new CGSpace vizualization thing:</li>
</ul>
<pre><code>213.139.52.250 - - [10/Jul/2018:13:39:41 +0000] &quot;GET /bitstream/handle/10568/75668/dryad.png HTTP/2.0&quot; 200 53750 &quot;http://localhost:4200/&quot; &quot;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36&quot;
</code></pre><ul>
<li>He said there was a bug that caused his app to request a bunch of invalid URLs</li>
<li>I'll have to keep and eye on this and see how their platform evolves</li>
</ul>
<h2 id="20180711">2018-07-11</h2>
<ul> <ul>
<li>Skype meeting with Peter and Addis CGSpace team <li>Skype meeting with Peter and Addis CGSpace team
<ul> <ul>
<li>We need to look at doing the <code>dc.rights</code> stuff again, which we last worked on in 2018-01 and 2018-02</li> <li>We need to look at doing the <code>dc.rights</code> stuff again, which we last worked on in 2018-01 and 2018-02</li>
<li>Abenet suggested that we do a controlled vocabulary for the authors, perhaps with the top 1,500 or so on CGSpace?</li> <li>Abenet suggested that we do a controlled vocabulary for the authors, perhaps with the top 1,500 or so on CGSpace?</li>
<li>Peter told Sisay to test this controlled vocabulary</li> <li>Peter told Sisay to test this controlled vocabulary</li>
<li>Discuss meeting in Nairobi in October</li> <li>Discuss meeting in Nairobi in October</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-07-12">2018-07-12</h2> </ul>
<h2 id="20180712">2018-07-12</h2>
<ul> <ul>
<li>Uptime Robot said that CGSpace went down a few times last night, around 10:45 PM and 12:30 AM</li> <li>Uptime Robot said that CGSpace went down a few times last night, around 10:45 PM and 12:30 AM</li>
<li>Here are the top ten IPs from last night and this morning:</li>
<li><p>Here are the top ten IPs from last night and this morning:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;11/Jul/2018:22&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;11/Jul/2018:22&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
48 66.249.64.91 48 66.249.64.91
50 35.227.26.162 50 35.227.26.162
57 157.55.39.234 57 157.55.39.234
59 157.55.39.71 59 157.55.39.71
62 147.99.27.190 62 147.99.27.190
82 95.108.181.88 82 95.108.181.88
92 40.77.167.90 92 40.77.167.90
97 183.128.40.185 97 183.128.40.185
97 240e:f0:44:fa53:745a:8afe:d221:1232 97 240e:f0:44:fa53:745a:8afe:d221:1232
3634 208.110.72.10 3634 208.110.72.10
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;12/Jul/2018:00&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;12/Jul/2018:00&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
25 216.244.66.198 25 216.244.66.198
38 40.77.167.185 38 40.77.167.185
46 66.249.64.93 46 66.249.64.93
56 157.55.39.71 56 157.55.39.71
60 35.227.26.162 60 35.227.26.162
65 157.55.39.234 65 157.55.39.234
83 95.108.181.88 83 95.108.181.88
87 66.249.64.91 87 66.249.64.91
96 40.77.167.90 96 40.77.167.90
7075 208.110.72.10 7075 208.110.72.10
</code></pre></li> </code></pre><ul>
<li>We have never seen <code>208.110.72.10</code> before&hellip; so that's interesting!</li>
<li><p>We have never seen <code>208.110.72.10</code> before&hellip; so that&rsquo;s interesting!</p></li> <li>The user agent for these requests is: Pcore-HTTP/v0.44.0</li>
<li>A brief Google search doesn't turn up any information about what this bot is, but lots of users complaining about it</li>
<li><p>The user agent for these requests is: Pcore-HTTP/v0.44.0</p></li> <li>This bot does make a lot of requests all through the day, although it seems to re-use its Tomcat session:</li>
</ul>
<li><p>A brief Google search doesn&rsquo;t turn up any information about what this bot is, but lots of users complaining about it</p></li>
<li><p>This bot does make a lot of requests all through the day, although it seems to re-use its Tomcat session:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;Pcore-HTTP&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;Pcore-HTTP&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
17098 208.110.72.10 17098 208.110.72.10
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-11 # grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-11
1161 1161
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-12 # grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-12
1885 1885
</code></pre></li> </code></pre><ul>
<li>I think the problem is that, despite the bot requesting <code>robots.txt</code>, it almost exlusively requests dynamic pages from <code>/discover</code>:</li>
<li><p>I think the problem is that, despite the bot requesting <code>robots.txt</code>, it almost exlusively requests dynamic pages from <code>/discover</code>:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;Pcore-HTTP&quot; | grep -o -E &quot;GET /(browse|discover|search-filter)&quot; | sort -n | uniq -c | sort -rn <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;Pcore-HTTP&quot; | grep -o -E &quot;GET /(browse|discover|search-filter)&quot; | sort -n | uniq -c | sort -rn
13364 GET /discover 13364 GET /discover
993 GET /search-filter 993 GET /search-filter
804 GET /browse 804 GET /browse
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;Pcore-HTTP&quot; | grep robots # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;Pcore-HTTP&quot; | grep robots
208.110.72.10 - - [12/Jul/2018:00:22:28 +0000] &quot;GET /robots.txt HTTP/1.1&quot; 200 1301 &quot;https://cgspace.cgiar.org/robots.txt&quot; &quot;Pcore-HTTP/v0.44.0&quot; 208.110.72.10 - - [12/Jul/2018:00:22:28 +0000] &quot;GET /robots.txt HTTP/1.1&quot; 200 1301 &quot;https://cgspace.cgiar.org/robots.txt&quot; &quot;Pcore-HTTP/v0.44.0&quot;
</code></pre></li> </code></pre><ul>
<li>So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting</li>
<li><p>So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting</p></li> <li>I'll also add it to Tomcat's Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case</li>
<li>Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):</li>
<li><p>I&rsquo;ll also add it to Tomcat&rsquo;s Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case</p></li> </ul>
<li><p>Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):</p>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv header <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv header
COPY 4518 COPY 4518
dspace=# \q dspace=# \q
$ csvcut -c 1 &lt; /tmp/affiliations.csv &gt; /tmp/affiliations-1.csv $ csvcut -c 1 &lt; /tmp/affiliations.csv &gt; /tmp/affiliations-1.csv
</code></pre></li> </code></pre><ul>
<li>We also need to discuss standardizing our countries and comparing our ORCID iDs</li>
<li><p>We also need to discuss standardizing our countries and comparing our ORCID iDs</p></li>
</ul> </ul>
<h2 id="20180713">2018-07-13</h2>
<h2 id="2018-07-13">2018-07-13</h2>
<ul> <ul>
<li><p>Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:</p> <li>Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv header; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv header;
COPY 4518 COPY 4518
</code></pre></li> </code></pre><h2 id="20180715">2018-07-15</h2>
</ul>
<h2 id="2018-07-15">2018-07-15</h2>
<ul> <ul>
<li>Run all system updates on CGSpace, add latest metadata changes from last week, and start the Linode instance upgrade</li> <li>Run all system updates on CGSpace, add latest metadata changes from last week, and start the Linode instance upgrade</li>
<li>After the upgrade I see we have more disk space available in the instance&rsquo;s dashboard, so I shut the instance down and resized it from 392GB to 650GB</li> <li>After the upgrade I see we have more disk space available in the instance's dashboard, so I shut the instance down and resized it from 392GB to 650GB</li>
<li>The resize was very quick (less than one minute) and after booting the instance back up I now have 631GB for the root filesystem (with 267GB available)!</li> <li>The resize was very quick (less than one minute) and after booting the instance back up I now have 631GB for the root filesystem (with 267GB available)!</li>
<li>Peter had asked a question about how mapped items are displayed in the Altmetric dashboard</li> <li>Peter had asked a question about how mapped items are displayed in the Altmetric dashboard</li>
<li>For example, <a href="10568/82810"><sup>10568</sup>&frasl;<sub>82810</sub></a> is mapped to four collections, but only shows up in one &ldquo;department&rdquo; in their dashboard</li> <li>For example, <a href="10568/82810">10568/82810</a> is mapped to four collections, but only shows up in one &ldquo;department&rdquo; in their dashboard</li>
<li>Altmetric help said that <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:cgspace.cgiar.org:10568/82810">according to OAI that item is only in one department</a></li> <li>Altmetric help said that <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:cgspace.cgiar.org:10568/82810">according to OAI that item is only in one department</a></li>
<li>I noticed that indeed there was only one collection listed, so I forced an OAI re-import on CGSpace:</li>
<li><p>I noticed that indeed there was only one collection listed, so I forced an OAI re-import on CGSpace:</p> </ul>
<pre><code>$ dspace oai import -c <pre><code>$ dspace oai import -c
OAI 2.0 manager action started OAI 2.0 manager action started
Clearing index Clearing index
@ -507,60 +425,47 @@ Full import
Total: 73925 items Total: 73925 items
Purging cached OAI responses. Purging cached OAI responses.
OAI 2.0 manager action ended. It took 697 seconds. OAI 2.0 manager action ended. It took 697 seconds.
</code></pre></li> </code></pre><ul>
<li>Now I see four colletions in OAI for that item!</li>
<li><p>Now I see four colletions in OAI for that item!</p></li> <li>I need to ask the dspace-tech mailing list if the nightly OAI import catches the case of old items that have had metadata or mappings change</li>
<li>ICARDA sent me a list of the ORCID iDs they have in the MEL system and it looks like almost 150 are new and unique to us!</li>
<li><p>I need to ask the dspace-tech mailing list if the nightly OAI import catches the case of old items that have had metadata or mappings change</p></li> </ul>
<li><p>ICARDA sent me a list of the ORCID iDs they have in the MEL system and it looks like almost 150 are new and unique to us!</p>
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l <pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
1020 1020
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l $ cat dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
1158 1158
</code></pre></li> </code></pre><ul>
<li>I combined the two lists and regenerated the names for all our the ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
<li><p>I combined the two lists and regenerated the names for all our the ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p> </ul>
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2018-07-15-orcid-ids.txt <pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2018-07-15-orcid-ids.txt
$ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolved-orcids.txt -d $ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolved-orcids.txt -d
</code></pre></li> </code></pre><ul>
<li>Then I added the XML formatting for controlled vocabularies, sorted the list with GNU sort in vim via <code>% !sort</code> and then checked the formatting with tidy:</li>
<li><p>Then I added the XML formatting for controlled vocabularies, sorted the list with GNU sort in vim via <code>% !sort</code> and then checked the formatting with tidy:</p>
<pre><code>$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
</code></pre></li>
<li><p>I will check with the CGSpace team to see if they want me to add these to CGSpace</p></li>
<li><p>Help Udana from WLE understand some Altmetrics concepts</p></li>
</ul> </ul>
<pre><code>$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
<h2 id="2018-07-18">2018-07-18</h2> </code></pre><ul>
<li>I will check with the CGSpace team to see if they want me to add these to CGSpace</li>
<li>Help Udana from WLE understand some Altmetrics concepts</li>
</ul>
<h2 id="20180718">2018-07-18</h2>
<ul> <ul>
<li>ICARDA sent me another refined list of ORCID iDs so I sorted and formatted them into our controlled vocabulary again</li> <li>ICARDA sent me another refined list of ORCID iDs so I sorted and formatted them into our controlled vocabulary again</li>
<li>Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media</li> <li>Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media</li>
<li>I told them that they should try to be including the Handle link on their social media shares because that&rsquo;s the only way to get Altmetric to notice them and associate them with their DOIs</li> <li>I told them that they should try to be including the Handle link on their social media shares because that's the only way to get Altmetric to notice them and associate them with their DOIs</li>
<li>I suggested that we should have a wider meeting about this, and that I would post that on Yammer</li> <li>I suggested that we should have a wider meeting about this, and that I would post that on Yammer</li>
<li>I was curious about how and when Altmetric harvests the OAI, so I looked in nginx&rsquo;s OAI log</li> <li>I was curious about how and when Altmetric harvests the OAI, so I looked in nginx's OAI log</li>
<li>For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts</li> <li>For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts</li>
<li>In there I see two bots making about 750 requests each, and this one is probably Altmetric:</li>
<li><p>In there I see two bots making about 750 requests each, and this one is probably Altmetric:</p> </ul>
<pre><code>178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100 HTTP/1.1&quot; 200 58653 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot; <pre><code>178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100 HTTP/1.1&quot; 200 58653 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot;
178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////200 HTTP/1.1&quot; 200 67950 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot; 178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////200 HTTP/1.1&quot; 200 67950 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot;
... ...
178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////73900 HTTP/1.1&quot; 20 0 25049 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot; 178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////73900 HTTP/1.1&quot; 20 0 25049 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot;
</code></pre></li> </code></pre><ul>
<li>So if they are getting 100 records per OAI request it would take them 739 requests</li>
<li><p>So if they are getting 100 records per OAI request it would take them 739 requests</p></li> <li>I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve&hellip; does OAI use Tomcat sessions?</li>
<li>Appears not:</li>
<li><p>I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve&hellip; does OAI use Tomcat sessions?</p></li> </ul>
<li><p>Appears not:</p>
<pre><code>$ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100' <pre><code>$ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100'
GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100 HTTP/1.1 GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100 HTTP/1.1
Accept: */* Accept: */*
@ -581,81 +486,61 @@ Vary: Accept-Encoding
X-Content-Type-Options: nosniff X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block X-XSS-Protection: 1; mode=block
</code></pre></li> </code></pre><h2 id="20180719">2018-07-19</h2>
</ul>
<h2 id="2018-07-19">2018-07-19</h2>
<ul> <ul>
<li>I tested a submission via SAF bundle to DSpace 5.8 and it worked fine</li> <li>I tested a submission via SAF bundle to DSpace 5.8 and it worked fine</li>
<li>In addition to testing DSpace 5.8, I specifically wanted to see if the issue with specifying collections in metadata instead of on the command line would work (<a href="https://jira.duraspace.org/browse/DS-3583">DS-3583</a>)</li> <li>In addition to testing DSpace 5.8, I specifically wanted to see if the issue with specifying collections in metadata instead of on the command line would work (<a href="https://jira.duraspace.org/browse/DS-3583">DS-3583</a>)</li>
<li>Post a note on Yammer about Altmetric and Handle best practices</li> <li>Post a note on Yammer about Altmetric and Handle best practices</li>
<li>Update PostgreSQL JDBC jar from 42.2.2 to 42.2.4 in the <a href="https://github.com/ilri/rmg-ansible-public">RMG Ansible playbooks</a></li> <li>Update PostgreSQL JDBC jar from 42.2.2 to 42.2.4 in the <a href="https://github.com/ilri/rmg-ansible-public">RMG Ansible playbooks</a></li>
<li>IWMI asked why all the dates in their <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&amp;scope=10568/16814&amp;sort_by=2&amp;order=DESC&amp;rpp=100&amp;format=rss">OpenSearch RSS feed</a> show up as January 01, 2018</li> <li>IWMI asked why all the dates in their <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&amp;scope=10568/16814&amp;sort_by=2&amp;order=DESC&amp;rpp=100&amp;format=rss">OpenSearch RSS feed</a> show up as January 01, 2018</li>
<li>On closer inspection I notice that many of their items use &ldquo;2018&rdquo; as their <code>dc.date.issued</code>, which is a valid ISO 8601 date but it&rsquo;s not very specific so DSpace assumes it is January 01, 2018 00:00:00&hellip;</li> <li>On closer inspection I notice that many of their items use &ldquo;2018&rdquo; as their <code>dc.date.issued</code>, which is a valid ISO 8601 date but it's not very specific so DSpace assumes it is January 01, 2018 00:00:00&hellip;</li>
<li>I told her that they need to start using more accurate dates for their issue dates</li> <li>I told her that they need to start using more accurate dates for their issue dates</li>
<li>In the example item I looked at the DOI has a publish date of 2018-03-16, so they should really try to capture that</li> <li>In the example item I looked at the DOI has a publish date of 2018-03-16, so they should really try to capture that</li>
</ul> </ul>
<h2 id="20180722">2018-07-22</h2>
<h2 id="2018-07-22">2018-07-22</h2>
<ul> <ul>
<li>I told the IWMI people that they can use <code>sort_by=3</code> in their OpenSearch query to sort the results by <code>dc.date.accessioned</code> instead of <code>dc.date.issued</code></li> <li>I told the IWMI people that they can use <code>sort_by=3</code> in their OpenSearch query to sort the results by <code>dc.date.accessioned</code> instead of <code>dc.date.issued</code></li>
<li>They say that it is a burden for them to capture the issue dates, so I cautioned them that this is in their own benefit for future posterity and that everyone else on CGSpace manages to capture the issue dates!</li> <li>They say that it is a burden for them to capture the issue dates, so I cautioned them that this is in their own benefit for future posterity and that everyone else on CGSpace manages to capture the issue dates!</li>
<li>For future reference, as I had previously noted in <a href="/cgspace-notes/2018-04/">2018-04</a>, sort options are configured in <code>dspace.cfg</code>, for example:</li>
<li><p>For future reference, as I had previously noted in <a href="/cgspace-notes/2018-04/">2018-04</a>, sort options are configured in <code>dspace.cfg</code>, for example:</p>
<pre><code>webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
</code></pre></li>
<li><p>Just because I was curious I made sure that these options are working as expected in DSpace 5.8 on DSpace Test (they are)</p></li>
<li><p>I tested the Atmire Listings and Reports (L&amp;R) module one last time on my local test environment with a new snapshot of CGSpace&rsquo;s database and re-generated Discovery index and it worked fine</p></li>
<li><p>I finally informed Atmire that we&rsquo;re ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in <code>pom.xml</code></p></li>
<li><p>There is no word on the issue I reported with Tomcat 8.5.32 yet, though&hellip;</p></li>
</ul> </ul>
<pre><code>webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
<h2 id="2018-07-23">2018-07-23</h2> </code></pre><ul>
<li>Just because I was curious I made sure that these options are working as expected in DSpace 5.8 on DSpace Test (they are)</li>
<li>I tested the Atmire Listings and Reports (L&amp;R) module one last time on my local test environment with a new snapshot of CGSpace's database and re-generated Discovery index and it worked fine</li>
<li>I finally informed Atmire that we're ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in <code>pom.xml</code></li>
<li>There is no word on the issue I reported with Tomcat 8.5.32 yet, though&hellip;</li>
</ul>
<h2 id="20180723">2018-07-23</h2>
<ul> <ul>
<li>Still discussing dates with IWMI</li> <li>Still discussing dates with IWMI</li>
<li>I looked in the database to see the breakdown of date formats used in <code>dc.date.issued</code>, ie YYYY, YYYY-MM, or YYYY-MM-DD:</li>
<li><p>I looked in the database to see the breakdown of date formats used in <code>dc.date.issued</code>, ie YYYY, YYYY-MM, or YYYY-MM-DD:</p> </ul>
<pre><code>dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}$'; <pre><code>dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}$';
count count
------- -------
53292 53292
(1 row) (1 row)
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}$'; dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}$';
count count
------- -------
3818 3818
(1 row) (1 row)
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}-[0-9]{2}$'; dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}-[0-9]{2}$';
count count
------- -------
17357 17357
</code></pre></li> </code></pre><ul>
<li>So it looks like YYYY is the most numerious, followed by YYYY-MM-DD, then YYYY-MM</li>
<li><p>So it looks like YYYY is the most numerious, followed by YYYY-MM-DD, then YYYY-MM</p></li>
</ul> </ul>
<h2 id="20180726">2018-07-26</h2>
<h2 id="2018-07-26">2018-07-26</h2>
<ul> <ul>
<li>Run system updates on DSpace Test (linode19) and reboot the server</li> <li>Run system updates on DSpace Test (linode19) and reboot the server</li>
</ul> </ul>
<h2 id="20180727">2018-07-27</h2>
<h2 id="2018-07-27">2018-07-27</h2>
<ul> <ul>
<li>Follow up with Atmire again about the SNAPSHOT versions in our <code>pom.xml</code> because I want to finalize the DSpace 5.8 upgrade soon and I haven&rsquo;t heard from them in a month (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket 560</a>)</li> <li>Follow up with Atmire again about the SNAPSHOT versions in our <code>pom.xml</code> because I want to finalize the DSpace 5.8 upgrade soon and I haven't heard from them in a month (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket 560</a>)</li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,24 +8,17 @@
<meta property="og:title" content="August, 2018" /> <meta property="og:title" content="August, 2018" />
<meta property="og:description" content="2018-08-01 <meta property="og:description" content="2018-08-01
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg: DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child [Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat&#39;s
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat&rsquo;s I&#39;m not sure why Tomcat didn&#39;t crash with an OutOfMemoryError&hellip;
I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
The server only has 8GB of RAM so we&#39;ll eventually need to upgrade to a larger one because we&#39;ll start starving the OS, PostgreSQL, and command line batch processes
The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it I ran all system updates on DSpace Test and rebooted it
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -37,27 +30,20 @@ I ran all system updates on DSpace Test and rebooted it
<meta name="twitter:title" content="August, 2018"/> <meta name="twitter:title" content="August, 2018"/>
<meta name="twitter:description" content="2018-08-01 <meta name="twitter:description" content="2018-08-01
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg: DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child [Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat&#39;s
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat&rsquo;s I&#39;m not sure why Tomcat didn&#39;t crash with an OutOfMemoryError&hellip;
I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
The server only has 8GB of RAM so we&#39;ll eventually need to upgrade to a larger one because we&#39;ll start starving the OS, PostgreSQL, and command line batch processes
The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it I ran all system updates on DSpace Test and rebooted it
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -138,101 +124,69 @@ I ran all system updates on DSpace Test and rebooted it
</p> </p>
</header> </header>
<h2 id="2018-08-01">2018-08-01</h2> <h2 id="20180801">2018-08-01</h2>
<ul> <ul>
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p> <li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child <pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li> <li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li> <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
</ul> </ul>
<ul> <ul>
<li>I started looking over the latest round of IITA batch records from Sisay on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/103250">IITA July_30</a> <li>I started looking over the latest round of IITA batch records from Sisay on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/103250">IITA July_30</a>
<ul> <ul>
<li>incorrect authorship types</li> <li>incorrect authorship types</li>
<li>dozens of inconsistencies, spelling mistakes, and white space in author affiliations</li> <li>dozens of inconsistencies, spelling mistakes, and white space in author affiliations</li>
<li>minor issues in countries (California is not a country)</li> <li>minor issues in countries (California is not a country)</li>
<li>minor issues in IITA subjects, ISBNs, languages, and AGROVOC subjects</li> <li>minor issues in IITA subjects, ISBNs, languages, and AGROVOC subjects</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-08-02">2018-08-02</h2> </ul>
<h2 id="20180802">2018-08-02</h2>
<ul> <ul>
<li><p>DSpace Test crashed again and I don&rsquo;t see the only error I see is this in <code>dmesg</code>:</p> <li>DSpace Test crashed again and I don't see the only error I see is this in <code>dmesg</code>:</li>
</ul>
<pre><code>[Thu Aug 2 00:00:12 2018] Out of memory: Kill process 1407 (java) score 787 or sacrifice child <pre><code>[Thu Aug 2 00:00:12 2018] Out of memory: Kill process 1407 (java) score 787 or sacrifice child
[Thu Aug 2 00:00:12 2018] Killed process 1407 (java) total-vm:18876328kB, anon-rss:6323836kB, file-rss:0kB, shmem-rss:0kB [Thu Aug 2 00:00:12 2018] Killed process 1407 (java) total-vm:18876328kB, anon-rss:6323836kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>I am still assuming that this is the Tomcat process that is dying, so maybe actually we need to reduce its memory instead of increasing it?</li>
<li><p>I am still assuming that this is the Tomcat process that is dying, so maybe actually we need to reduce its memory instead of increasing it?</p></li> <li>The risk we run there is that we'll start getting OutOfMemory errors from Tomcat</li>
<li>So basically we need a new test server with more RAM very soon&hellip;</li>
<li><p>The risk we run there is that we&rsquo;ll start getting OutOfMemory errors from Tomcat</p></li> <li>Abenet asked about the workflow statistics in the Atmire CUA module again</li>
<li>Last year Atmire told me that it's disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</li>
<li><p>So basically we need a new test server with more RAM very soon&hellip;</p></li> <li>There was a bug with adding users so they sent a patch, but I didn't merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn't sure it actually fixed the problem</li>
<li>I just tried to enable the stats again on DSpace Test now that we're on DSpace 5.8 with updated Atmire modules, but every user I search for shows &ldquo;No data available&rdquo;</li>
<li><p>Abenet asked about the workflow statistics in the Atmire CUA module again</p></li> <li>As a test I submitted a new item and I was able to see it in the workflow statistics &ldquo;data&rdquo; tab, but not in the graph</li>
<li><p>Last year Atmire told me that it&rsquo;s disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</p></li>
<li><p>There was a bug with adding users so they sent a patch, but I didn&rsquo;t merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn&rsquo;t sure it actually fixed the problem</p></li>
<li><p>I just tried to enable the stats again on DSpace Test now that we&rsquo;re on DSpace 5.8 with updated Atmire modules, but every user I search for shows &ldquo;No data available&rdquo;</p></li>
<li><p>As a test I submitted a new item and I was able to see it in the workflow statistics &ldquo;data&rdquo; tab, but not in the graph</p></li>
</ul> </ul>
<h2 id="20180815">2018-08-15</h2>
<h2 id="2018-08-15">2018-08-15</h2>
<ul> <ul>
<li>Run through Peter&rsquo;s list of author affiliations from earlier this month</li> <li>Run through Peter's list of author affiliations from earlier this month</li>
<li>I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors</li> <li>I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors</li>
<li>Finally I did a test run with the <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
<li><p>Finally I did a test run with the <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i 2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211 <pre><code>$ ./fix-metadata-values.py -i 2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
$ ./delete-metadata-values.py -i 2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 $ ./delete-metadata-values.py -i 2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
</code></pre></li> </code></pre><h2 id="20180816">2018-08-16</h2>
</ul>
<h2 id="2018-08-16">2018-08-16</h2>
<ul> <ul>
<li><p>Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:</p> <li>Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc limit 1500) to /tmp/2018-08-16-top-1500-authors.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc limit 1500) to /tmp/2018-08-16-top-1500-authors.csv with csv;
</code></pre></li> </code></pre><ul>
<li>Start working on adding the ORCID metadata to a handful of CIAT authors as requested by Elizabeth earlier this month</li>
<li><p>Start working on adding the ORCID metadata to a handful of CIAT authors as requested by Elizabeth earlier this month</p></li> <li>I might need to overhaul the <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script to be a little more robust about author order and ORCID metadata that might have been altered manually by editors after submission, as this script was written without that consideration</li>
<li>After checking a few examples I see that checking only the <code>text_value</code> and <code>place</code> when adding ORCID fields is not enough anymore</li>
<li><p>I might need to overhaul the <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script to be a little more robust about author order and ORCID metadata that might have been altered manually by editors after submission, as this script was written without that consideration</p></li> <li>It was a sane assumption when I was initially migrating ORCID records from Solr to regular metadata, but now it seems that some authors might have been added or changed after item submission</li>
<li>Now it is better to check if there is <em>any</em> existing ORCID identifier for a given author for the item&hellip;</li>
<li><p>After checking a few examples I see that checking only the <code>text_value</code> and <code>place</code> when adding ORCID fields is not enough anymore</p></li> <li>I will have to update my script to extract the ORCID identifier and search for that</li>
<li>Re-create my local DSpace database using the latest PostgreSQL 9.6 Docker image and re-import the latest CGSpace dump:</li>
<li><p>It was a sane assumption when I was initially migrating ORCID records from Solr to regular metadata, but now it seems that some authors might have been added or changed after item submission</p></li> </ul>
<li><p>Now it is better to check if there is <em>any</em> existing ORCID identifier for a given author for the item&hellip;</p></li>
<li><p>I will have to update my script to extract the ORCID identifier and search for that</p></li>
<li><p>Re-create my local DSpace database using the latest PostgreSQL 9.6 Docker image and re-import the latest CGSpace dump:</p>
<pre><code>$ sudo docker run --name dspacedb -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine <pre><code>$ sudo docker run --name dspacedb -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
$ createuser -h localhost -U postgres --pwprompt dspacetest $ createuser -h localhost -U postgres --pwprompt dspacetest
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest $ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
@ -240,18 +194,13 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest ~/Downloads/cgspace_2018-08-16.backup $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest ~/Downloads/cgspace_2018-08-16.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
</code></pre></li> </code></pre><h2 id="20180819">2018-08-19</h2>
</ul>
<h2 id="2018-08-19">2018-08-19</h2>
<ul> <ul>
<li>Keep working on the CIAT ORCID identifiers from Elizabeth</li> <li>Keep working on the CIAT ORCID identifiers from Elizabeth</li>
<li>In the spreadsheet she sent me there are some names with other versions in the database, so when it is obviously the same one (ie &ldquo;Schultze-Kraft, Rainer&rdquo; and &ldquo;Schultze-Kraft, R.&rdquo;) I will just tag them with ORCID identifiers too</li> <li>In the spreadsheet she sent me there are some names with other versions in the database, so when it is obviously the same one (ie &ldquo;Schultze-Kraft, Rainer&rdquo; and &ldquo;Schultze-Kraft, R.&quot;) I will just tag them with ORCID identifiers too</li>
<li>This is less obvious and more error prone with names like &ldquo;Peters&rdquo; where there are many more authors</li> <li>This is less obvious and more error prone with names like &ldquo;Peters&rdquo; where there are many more authors</li>
<li>I see some errors in the variations of names as well, for example:</li>
<li><p>I see some errors in the variations of names as well, for example:</p> </ul>
<pre><code>Verchot, Louis <pre><code>Verchot, Louis
Verchot, L Verchot, L
Verchot, L. V. Verchot, L. V.
@ -259,12 +208,10 @@ Verchot, L.V
Verchot, L.V. Verchot, L.V.
Verchot, LV Verchot, LV
Verchot, Louis V. Verchot, Louis V.
</code></pre></li> </code></pre><ul>
<li>I'll just tag them all with Louis Verchot's ORCID identifier&hellip;</li>
<li><p>I&rsquo;ll just tag them all with Louis Verchot&rsquo;s ORCID identifier&hellip;</p></li> <li>In the end, I'll run the following CSV with my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script:</li>
</ul>
<li><p>In the end, I&rsquo;ll run the following CSV with my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script:</p>
<pre><code>dc.contributor.author,cg.creator.id <pre><code>dc.contributor.author,cg.creator.id
&quot;Campbell, Bruce&quot;,Bruce M Campbell: 0000-0002-0123-4859 &quot;Campbell, Bruce&quot;,Bruce M Campbell: 0000-0002-0123-4859
&quot;Campbell, Bruce M.&quot;,Bruce M Campbell: 0000-0002-0123-4859 &quot;Campbell, Bruce M.&quot;,Bruce M Campbell: 0000-0002-0123-4859
@ -293,81 +240,66 @@ Verchot, Louis V.
&quot;Chirinda, Ngonidzashe&quot;,Ngonidzashe Chirinda: 0000-0002-4213-6294 &quot;Chirinda, Ngonidzashe&quot;,Ngonidzashe Chirinda: 0000-0002-4213-6294
&quot;Chirinda, Ngoni&quot;,Ngonidzashe Chirinda: 0000-0002-4213-6294 &quot;Chirinda, Ngoni&quot;,Ngonidzashe Chirinda: 0000-0002-4213-6294
&quot;Ngonidzashe, Chirinda&quot;,Ngonidzashe Chirinda: 0000-0002-4213-6294 &quot;Ngonidzashe, Chirinda&quot;,Ngonidzashe Chirinda: 0000-0002-4213-6294
</code></pre></li> </code></pre><ul>
<li>The invocation would be:</li>
<li><p>The invocation would be:</p> </ul>
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-08-16-ciat-orcid.csv -db dspace -u dspace -p 'fuuu' <pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-08-16-ciat-orcid.csv -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>I ran the script on DSpace Test and CGSpace and tagged a total of 986 ORCID identifiers</li>
<li><p>I ran the script on DSpace Test and CGSpace and tagged a total of 986 ORCID identifiers</p></li> <li>Looking at the list of author affialitions from Peter one last time</li>
<li>I notice that I should add the Unicode character 0x00b4 (`) to my list of invalid characters to look for in Open Refine, making the latest version of the GREL expression being:</li>
<li><p>Looking at the list of author affialitions from Peter one last time</p></li> </ul>
<li><p>I notice that I should add the Unicode character 0x00b4 (`) to my list of invalid characters to look for in Open Refine, making the latest version of the GREL expression being:</p>
<pre><code>or( <pre><code>or(
isNotNull(value.match(/.*\uFFFD.*/)), isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)), isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)), isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)), isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/)) isNotNull(value.match(/.*\u00b4.*/))
) )
</code></pre></li> </code></pre><ul>
<li>This character all by itself is indicative of encoding issues in French, Italian, and Spanish names, for example: De´veloppement and Investigacio´n</li>
<li><p>This character all by itself is indicative of encoding issues in French, Italian, and Spanish names, for example: De´veloppement and Investigacio´n</p></li> <li>I will run the following on DSpace Test and CGSpace:</li>
</ul>
<li><p>I will run the following on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211 <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
$ ./delete-metadata-values.py -i /tmp/2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 $ ./delete-metadata-values.py -i /tmp/2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
</code></pre></li> </code></pre><ul>
<li>Then force an update of the Discovery index on DSpace Test:</li>
<li><p>Then force an update of the Discovery index on DSpace Test:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 72m12.570s real 72m12.570s
user 6m45.305s user 6m45.305s
sys 2m2.461s sys 2m2.461s
</code></pre></li> </code></pre><ul>
<li>And then on CGSpace:</li>
<li><p>And then on CGSpace:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 79m44.392s real 79m44.392s
user 8m50.730s user 8m50.730s
sys 2m20.248s sys 2m20.248s
</code></pre></li> </code></pre><ul>
<li>Run system updates on DSpace Test and reboot the server</li>
<li><p>Run system updates on DSpace Test and reboot the server</p></li> <li>In unrelated news, I see some newish Russian bot making a few thousand requests per day and not re-using its XMLUI session:</li>
</ul>
<li><p>In unrelated news, I see some newish Russian bot making a few thousand requests per day and not re-using its XMLUI session:</p>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '19/Aug/2018' | grep -c 5.9.6.51 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '19/Aug/2018' | grep -c 5.9.6.51
1553 1553
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19 # grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19
1724 1724
</code></pre></li> </code></pre><ul>
<li>I don't even know how its possible for the bot to use MORE sessions than total requests&hellip;</li>
<li><p>I don&rsquo;t even know how its possible for the bot to use MORE sessions than total requests&hellip;</p></li> <li>The user agent is:</li>
<li><p>The user agent is:</p>
<pre><code>Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
</code></pre></li>
<li><p>So I&rsquo;m thinking we should add &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager valve, as we already have &ldquo;bot&rdquo; that catches Googlebot, Bingbot, etc.</p></li>
</ul> </ul>
<pre><code>Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
<h2 id="2018-08-20">2018-08-20</h2> </code></pre><ul>
<li>So I'm thinking we should add &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager valve, as we already have &ldquo;bot&rdquo; that catches Googlebot, Bingbot, etc.</li>
</ul>
<h2 id="20180820">2018-08-20</h2>
<ul> <ul>
<li>Help Sisay with some UTF-8 encoding issues in a file Peter sent him</li> <li>Help Sisay with some UTF-8 encoding issues in a file Peter sent him</li>
<li>Finish up reconciling Atmire&rsquo;s pull request for DSpace 5.8 changes with the latest status of our <code>5_x-prod</code> branch</li> <li>Finish up reconciling Atmire's pull request for DSpace 5.8 changes with the latest status of our <code>5_x-prod</code> branch</li>
<li>I had to do some <code>git rev-list --reverse --no-merges oldestcommit..newestcommit</code> and <code>git cherry-pick -S</code> hackery to get everything all in order</li> <li>I had to do some <code>git rev-list --reverse --no-merges oldestcommit..newestcommit</code> and <code>git cherry-pick -S</code> hackery to get everything all in order</li>
<li>After building I ran the Atmire schema migrations and forced old migrations, then did the <code>ant update</code></li> <li>After building I ran the Atmire schema migrations and forced old migrations, then did the <code>ant update</code></li>
<li>I tried to build it on DSpace Test, but it seems to still need more RAM to complete (like I experienced last month), so I stopped Tomcat and set <code>JAVA_OPTS</code> to 1024m and tried the <code>mvn package</code> again</li> <li>I tried to build it on DSpace Test, but it seems to still need more RAM to complete (like I experienced last month), so I stopped Tomcat and set <code>JAVA_OPTS</code> to 1024m and tried the <code>mvn package</code> again</li>
@ -375,83 +307,59 @@ sys 2m20.248s
<li>I will try to reduce Tomcat memory from 4608m to 4096m and then retry the <code>mvn package</code> with 1024m of <code>JAVA_OPTS</code> again</li> <li>I will try to reduce Tomcat memory from 4608m to 4096m and then retry the <code>mvn package</code> with 1024m of <code>JAVA_OPTS</code> again</li>
<li>After running the <code>mvn package</code> for the third time and waiting an hour, I attached <code>strace</code> to the Java process and saw that it was indeed reading XMLUI theme data&hellip; so I guess I just need to wait more</li> <li>After running the <code>mvn package</code> for the third time and waiting an hour, I attached <code>strace</code> to the Java process and saw that it was indeed reading XMLUI theme data&hellip; so I guess I just need to wait more</li>
<li>After waiting two hours the maven process completed and installation was successful</li> <li>After waiting two hours the maven process completed and installation was successful</li>
<li>I restarted Tomcat and it seems everything is working well, so I&rsquo;ll merge the pull request and try to schedule the CGSpace upgrade for this coming Sunday, August 26th</li> <li>I restarted Tomcat and it seems everything is working well, so I'll merge the pull request and try to schedule the CGSpace upgrade for this coming Sunday, August 26th</li>
<li>I merged <a href="https://github.com/ilri/DSpace/pull/378">Atmire&rsquo;s pull request</a> into our <code>5_x-dspace-5.8</code> temporary brach and then cherry-picked all the changes from <code>5_x-prod</code> since April, 2018 when that temporary branch was created</li> <li>I merged <a href="https://github.com/ilri/DSpace/pull/378">Atmire's pull request</a> into our <code>5_x-dspace-5.8</code> temporary brach and then cherry-picked all the changes from <code>5_x-prod</code> since April, 2018 when that temporary branch was created</li>
<li>As the branch histories are very different I cannot merge the new 5.8 branch into the current <code>5_x-prod</code> branch</li> <li>As the branch histories are very different I cannot merge the new 5.8 branch into the current <code>5_x-prod</code> branch</li>
<li>Instead, I will archive the current <code>5_x-prod</code> DSpace 5.5 branch as <code>5_x-prod-dspace-5.5</code> and then hard reset <code>5_x-prod</code> based on <code>5_x-dspace-5.8</code></li> <li>Instead, I will archive the current <code>5_x-prod</code> DSpace 5.5 branch as <code>5_x-prod-dspace-5.5</code> and then hard reset <code>5_x-prod</code> based on <code>5_x-dspace-5.8</code></li>
<li>Unfortunately this will mess up the references in pull requests and issues on GitHub</li> <li>Unfortunately this will mess up the references in pull requests and issues on GitHub</li>
</ul> </ul>
<h2 id="20180821">2018-08-21</h2>
<h2 id="2018-08-21">2018-08-21</h2>
<ul> <ul>
<li><p>Something must have happened, as the <code>mvn package</code> <em>always</em> takes about two hours now, stopping for a very long time near the end at this step:</p> <li>Something must have happened, as the <code>mvn package</code> <em>always</em> takes about two hours now, stopping for a very long time near the end at this step:</li>
<pre><code>[INFO] Processing overlay [ id org.dspace.modules:xmlui-mirage2]
</code></pre></li>
<li><p>It&rsquo;s the same on DSpace Test, my local laptop, and CGSpace&hellip;</p></li>
<li><p>It wasn&rsquo;t this way before when I was constantly building the previous 5.8 branch with Atmire patches&hellip;</p></li>
<li><p>I will restore the previous <code>5_x-dspace-5.8</code> and <code>atmire-module-upgrades-5.8</code> branches to see if the build time is different there</p></li>
<li><p>&hellip; it seems that the <code>atmire-module-upgrades-5.8</code> branch still takes 1 hour and 23 minutes on my local machine&hellip;</p></li>
<li><p>Let me try to build the old <code>5_x-prod-dspace-5.5</code> branch on my local machine and see how long it takes</p></li>
<li><p>That one only took 13 minutes! So there is definitely something wrong with our 5.8 branch, now I should try vanilla DSpace 5.8</p></li>
<li><p>I notice that the step this pauses at is:</p>
<pre><code>[INFO] --- maven-war-plugin:2.4:war (default-war) @ xmlui ---
</code></pre></li>
<li><p>And I notice that Atmire changed something in the XMLUI module&rsquo;s <code>pom.xml</code> as part of the DSpace 5.8 changes, specifically to remove the exclude for <code>node_modules</code> in the <code>maven-war-plugin</code> step</p></li>
<li><p>This exclude is <em>present</em> in vanilla DSpace, and if I add it back the build time goes from 1 hour 23 minutes to 12 minutes!</p></li>
<li><p>It makes sense that it would take longer to complete this step because the <code>node_modules</code> folder has tens of thousands of files, and we have 27 themes!</p></li>
<li><p>I need to test to see if this has any side effects when deployed&hellip;</p></li>
<li><p>In other news, I see there was a pull request in DSpace 5.9 that fixes the issue with not being able to have blank lines in CSVs when importing via command line or webui (<a href="https://jira.duraspace.org/browse/DS-3245">DS-3245</a>)</p></li>
</ul> </ul>
<pre><code>[INFO] Processing overlay [ id org.dspace.modules:xmlui-mirage2]
<h2 id="2018-08-23">2018-08-23</h2> </code></pre><ul>
<li>It's the same on DSpace Test, my local laptop, and CGSpace&hellip;</li>
<li>It wasn't this way before when I was constantly building the previous 5.8 branch with Atmire patches&hellip;</li>
<li>I will restore the previous <code>5_x-dspace-5.8</code> and <code>atmire-module-upgrades-5.8</code> branches to see if the build time is different there</li>
<li>&hellip; it seems that the <code>atmire-module-upgrades-5.8</code> branch still takes 1 hour and 23 minutes on my local machine&hellip;</li>
<li>Let me try to build the old <code>5_x-prod-dspace-5.5</code> branch on my local machine and see how long it takes</li>
<li>That one only took 13 minutes! So there is definitely something wrong with our 5.8 branch, now I should try vanilla DSpace 5.8</li>
<li>I notice that the step this pauses at is:</li>
</ul>
<pre><code>[INFO] --- maven-war-plugin:2.4:war (default-war) @ xmlui ---
</code></pre><ul>
<li>And I notice that Atmire changed something in the XMLUI module's <code>pom.xml</code> as part of the DSpace 5.8 changes, specifically to remove the exclude for <code>node_modules</code> in the <code>maven-war-plugin</code> step</li>
<li>This exclude is <em>present</em> in vanilla DSpace, and if I add it back the build time goes from 1 hour 23 minutes to 12 minutes!</li>
<li>It makes sense that it would take longer to complete this step because the <code>node_modules</code> folder has tens of thousands of files, and we have 27 themes!</li>
<li>I need to test to see if this has any side effects when deployed&hellip;</li>
<li>In other news, I see there was a pull request in DSpace 5.9 that fixes the issue with not being able to have blank lines in CSVs when importing via command line or webui (<a href="https://jira.duraspace.org/browse/DS-3245">DS-3245</a>)</li>
</ul>
<h2 id="20180823">2018-08-23</h2>
<ul> <ul>
<li>Skype meeting with CKM people to meet new web dev guy Tariku</li> <li>Skype meeting with CKM people to meet new web dev guy Tariku</li>
<li>They say they want to start working on the ContentDM harvester middleware again</li> <li>They say they want to start working on the ContentDM harvester middleware again</li>
<li>I sent a list of the top 1500 author affiliations on CGSpace to CodeObia so we can compare ours with the ones on MELSpace</li> <li>I sent a list of the top 1500 author affiliations on CGSpace to CodeObia so we can compare ours with the ones on MELSpace</li>
<li>Discuss CTA items with Sisay, he was trying to figure out how to do the collection mapping in combination with SAFBuilder</li> <li>Discuss CTA items with Sisay, he was trying to figure out how to do the collection mapping in combination with SAFBuilder</li>
<li>It appears that the web UI&rsquo;s upload interface <em>requires</em> you to specify the collection, whereas the CLI interface allows you to omit the collection command line flag and defer to the <code>collections</code> file inside each item in the bundle</li> <li>It appears that the web UI's upload interface <em>requires</em> you to specify the collection, whereas the CLI interface allows you to omit the collection command line flag and defer to the <code>collections</code> file inside each item in the bundle</li>
<li>I imported the CTA items on CGSpace for Sisay:</li>
<li><p>I imported the CTA items on CGSpace for Sisay:</p>
<pre><code>$ dspace import -a -e s.webshet@cgiar.org -s /home/swebshet/ictupdates_uploads_August_21 -m /tmp/2018-08-23-cta-ictupdates.map
</code></pre></li>
</ul> </ul>
<pre><code>$ dspace import -a -e s.webshet@cgiar.org -s /home/swebshet/ictupdates_uploads_August_21 -m /tmp/2018-08-23-cta-ictupdates.map
<h2 id="2018-08-26">2018-08-26</h2> </code></pre><h2 id="20180826">2018-08-26</h2>
<ul> <ul>
<li>Doing the DSpace 5.8 upgrade on CGSpace (linode18)</li> <li>Doing the DSpace 5.8 upgrade on CGSpace (linode18)</li>
<li>I already finished the Maven build, now I'll take a backup of the PostgreSQL database and do a database cleanup just in case:</li>
<li><p>I already finished the Maven build, now I&rsquo;ll take a backup of the PostgreSQL database and do a database cleanup just in case:</p> </ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-08-26-before-dspace-58.backup dspace <pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-08-26-before-dspace-58.backup dspace
$ dspace cleanup -v $ dspace cleanup -v
</code></pre></li> </code></pre><ul>
<li>Now I can stop Tomcat and do the install:</li>
<li><p>Now I can stop Tomcat and do the install:</p> </ul>
<pre><code>$ cd dspace/target/dspace-installer <pre><code>$ cd dspace/target/dspace-installer
$ ant update clean_backups update_geolite $ ant update clean_backups update_geolite
</code></pre></li> </code></pre><ul>
<li>After the successful Ant update I can run the database migrations:</li>
<li><p>After the successful Ant update I can run the database migrations:</p> </ul>
<pre><code>$ psql dspace dspace <pre><code>$ psql dspace dspace
dspace=&gt; \i /tmp/Atmire-DSpace-5.8-Schema-Migration.sql dspace=&gt; \i /tmp/Atmire-DSpace-5.8-Schema-Migration.sql
@ -461,74 +369,51 @@ DELETE 1
dspace=&gt; \q dspace=&gt; \q
$ dspace database migrate ignored $ dspace database migrate ignored
</code></pre></li> </code></pre><ul>
<li>Then I'll run all system updates and reboot the server:</li>
<li><p>Then I&rsquo;ll run all system updates and reboot the server:</p> </ul>
<pre><code>$ sudo su - <pre><code>$ sudo su -
# apt update &amp;&amp; apt full-upgrade # apt update &amp;&amp; apt full-upgrade
# apt clean &amp;&amp; apt autoclean &amp;&amp; apt autoremove # apt clean &amp;&amp; apt autoclean &amp;&amp; apt autoremove
# reboot # reboot
</code></pre></li> </code></pre><ul>
<li>After reboot I logged in and cleared all the XMLUI caches and everything looked to be working fine</li>
<li><p>After reboot I logged in and cleared all the XMLUI caches and everything looked to be working fine</p></li> <li>Adam from WLE had asked a few weeks ago about getting the metadata for a bunch of items related to gender from 2013 until now</li>
<li>They want a CSV with <em>all</em> metadata, which the Atmire Listings and Reports module can't do</li>
<li><p>Adam from WLE had asked a few weeks ago about getting the metadata for a bunch of items related to gender from 2013 until now</p></li> <li>I exported a list of items from Listings and Reports with the following criteria: from year 2013 until now, have WLE subject <code>GENDER</code> or <code>GENDER POVERTY AND INSTITUTIONS</code>, and CRP <code>Water, Land and Ecosystems</code></li>
<li>Then I extracted the Handle links from the report so I could export each item's metadata as CSV</li>
<li><p>They want a CSV with <em>all</em> metadata, which the Atmire Listings and Reports module can&rsquo;t do</p></li>
<li><p>I exported a list of items from Listings and Reports with the following criteria: from year 2013 until now, have WLE subject <code>GENDER</code> or <code>GENDER POVERTY AND INSTITUTIONS</code>, and CRP <code>Water, Land and Ecosystems</code></p></li>
<li><p>Then I extracted the Handle links from the report so I could export each item&rsquo;s metadata as CSV</p>
<pre><code>$ grep -o -E &quot;[0-9]{5}/[0-9]{0,5}&quot; listings-export.txt &gt; /tmp/iwmi-gender-items.txt
</code></pre></li>
<li><p>Then on the DSpace server I exported the metadata for each item one by one:</p>
<pre><code>$ while read -r line; do dspace metadata-export -f &quot;/tmp/${line/\//-}.csv&quot; -i $line; sleep 2; done &lt; /tmp/iwmi-gender-items.txt
</code></pre></li>
<li><p>But from here I realized that each of the fifty-nine items will have different columns in their CSVs, making it difficult to combine them</p></li>
<li><p>I&rsquo;m not sure how to proceed without writing some script to parse and join the CSVs, and I don&rsquo;t think it&rsquo;s worth my time</p></li>
<li><p>I tested DSpace 5.8 in Tomcat 8.5.32 and it seems to work now, so I&rsquo;m not sure why I got those errors last time I tried</p></li>
<li><p>It could have been a configuration issue, though, as I also reconciled the <code>server.xml</code> with the one in <a href="https://github.com/ilri/rmg-ansible-public">our Ansible infrastructure scripts</a></p></li>
<li><p>But now I can start testing and preparing to move DSpace Test to Ubuntu 18.04 + Tomcat 8.5 + OpenJDK + PostgreSQL 9.6&hellip;</p></li>
<li><p>Actually, upon closer inspection, it seems that when you try to go to Listings and Reports under Tomcat 8.5.33 you are taken to the JSPUI login page despite having already logged in in XMLUI</p></li>
<li><p>If I type my username and password again it <em>does</em> take me to Listings and Reports, though&hellip;</p></li>
<li><p>I don&rsquo;t see anything interesting in the Catalina or DSpace logs, so I might have to file a bug with Atmire</p></li>
<li><p>For what it&rsquo;s worth, the Content and Usage (CUA) module does load, though I can&rsquo;t seem to get any results in the graph</p></li>
<li><p>I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">#589</a></p></li>
<li><p>I was able to create a new layout containing only the citation field, so I closed the ticket</p></li>
</ul> </ul>
<pre><code>$ grep -o -E &quot;[0-9]{5}/[0-9]{0,5}&quot; listings-export.txt &gt; /tmp/iwmi-gender-items.txt
<h2 id="2018-08-29">2018-08-29</h2> </code></pre><ul>
<li>Then on the DSpace server I exported the metadata for each item one by one:</li>
</ul>
<pre><code>$ while read -r line; do dspace metadata-export -f &quot;/tmp/${line/\//-}.csv&quot; -i $line; sleep 2; done &lt; /tmp/iwmi-gender-items.txt
</code></pre><ul>
<li>But from here I realized that each of the fifty-nine items will have different columns in their CSVs, making it difficult to combine them</li>
<li>I'm not sure how to proceed without writing some script to parse and join the CSVs, and I don't think it's worth my time</li>
<li>I tested DSpace 5.8 in Tomcat 8.5.32 and it seems to work now, so I'm not sure why I got those errors last time I tried</li>
<li>It could have been a configuration issue, though, as I also reconciled the <code>server.xml</code> with the one in <a href="https://github.com/ilri/rmg-ansible-public">our Ansible infrastructure scripts</a></li>
<li>But now I can start testing and preparing to move DSpace Test to Ubuntu 18.04 + Tomcat 8.5 + OpenJDK + PostgreSQL 9.6&hellip;</li>
<li>Actually, upon closer inspection, it seems that when you try to go to Listings and Reports under Tomcat 8.5.33 you are taken to the JSPUI login page despite having already logged in in XMLUI</li>
<li>If I type my username and password again it <em>does</em> take me to Listings and Reports, though&hellip;</li>
<li>I don't see anything interesting in the Catalina or DSpace logs, so I might have to file a bug with Atmire</li>
<li>For what it's worth, the Content and Usage (CUA) module does load, though I can't seem to get any results in the graph</li>
<li>I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">#589</a></li>
<li>I was able to create a new layout containing only the citation field, so I closed the ticket</li>
</ul>
<h2 id="20180829">2018-08-29</h2>
<ul> <ul>
<li>Discuss <a href="https://copo-project.org/copo/">COPO</a> with Martin Mueller</li> <li>Discuss <a href="https://copo-project.org/copo/">COPO</a> with Martin Mueller</li>
<li>He and the consortium&rsquo;s idea is to use this for metadata annotation (submission?) to all repositories</li> <li>He and the consortium's idea is to use this for metadata annotation (submission?) to all repositories</li>
<li>It is somehow related to adding events as items in the repository, and then linking related papers, presentations, etc to the event item using <code>dc.relation</code>, etc.</li> <li>It is somehow related to adding events as items in the repository, and then linking related papers, presentations, etc to the event item using <code>dc.relation</code>, etc.</li>
<li>Discuss Linode server charges with Abenet, apparently we want to start charging these to Big Data</li> <li>Discuss Linode server charges with Abenet, apparently we want to start charging these to Big Data</li>
</ul> </ul>
<h2 id="20180830">2018-08-30</h2>
<h2 id="2018-08-30">2018-08-30</h2>
<ul> <ul>
<li>I fixed the graphical glitch in the cookieconsent popup (the dismiss bug is still there) by pinning the last known good version (3.0.6) in <code>bower.json</code> of each XMLUI theme</li> <li>I fixed the graphical glitch in the cookieconsent popup (the dismiss bug is still there) by pinning the last known good version (3.0.6) in <code>bower.json</code> of each XMLUI theme</li>
<li>I guess cookieconsent got updated without me realizing it and the previous expression <code>^3.0.6</code> make bower install version 3.1.0</li> <li>I guess cookieconsent got updated without me realizing it and the previous expression <code>^3.0.6</code> make bower install version 3.1.0</li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

File diff suppressed because it is too large Load Diff

View File

@ -8,9 +8,8 @@
<meta property="og:title" content="October, 2018" /> <meta property="og:title" content="October, 2018" />
<meta property="og:description" content="2018-10-01 <meta property="og:description" content="2018-10-01
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
I created a GitHub issue to track this #389, because I&rsquo;m super busy in Nairobi right now I created a GitHub issue to track this #389, because I&#39;m super busy in Nairobi right now
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-10/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-10/" />
@ -21,11 +20,10 @@ I created a GitHub issue to track this #389, because I&rsquo;m super busy in Nai
<meta name="twitter:title" content="October, 2018"/> <meta name="twitter:title" content="October, 2018"/>
<meta name="twitter:description" content="2018-10-01 <meta name="twitter:description" content="2018-10-01
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
I created a GitHub issue to track this #389, because I&rsquo;m super busy in Nairobi right now I created a GitHub issue to track this #389, because I&#39;m super busy in Nairobi right now
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -106,104 +104,86 @@ I created a GitHub issue to track this #389, because I&rsquo;m super busy in Nai
</p> </p>
</header> </header>
<h2 id="2018-10-01">2018-10-01</h2> <h2 id="20181001">2018-10-01</h2>
<ul> <ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul> </ul>
<h2 id="20181003">2018-10-03</h2>
<h2 id="2018-10-03">2018-10-03</h2>
<ul> <ul>
<li><p>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</p> <li>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Oct/2018&quot; | awk '{print $1} <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Oct/2018&quot; | awk '{print $1}
' | sort | uniq -c | sort -n | tail -n 10 ' | sort | uniq -c | sort -n | tail -n 10
933 40.77.167.90 933 40.77.167.90
971 95.108.181.88 971 95.108.181.88
1043 41.204.190.40 1043 41.204.190.40
1454 157.55.39.54 1454 157.55.39.54
1538 207.46.13.69 1538 207.46.13.69
1719 66.249.64.61 1719 66.249.64.61
2048 50.116.102.77 2048 50.116.102.77
4639 66.249.64.59 4639 66.249.64.59
4736 35.237.175.180 4736 35.237.175.180
150362 34.218.226.147 150362 34.218.226.147
</code></pre></li> </code></pre><ul>
<li>Of those, about 20% were HTTP 500 responses (!):</li>
<li><p>Of those, about 20% were HTTP 500 responses (!):</p> </ul>
<pre><code>$ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Oct/2018&quot; | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c <pre><code>$ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Oct/2018&quot; | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c
118927 200 118927 200
31435 500 31435 500
</code></pre></li> </code></pre><ul>
<li>I added Phil Thornton and Sonal Henson's ORCID identifiers to the controlled vocabulary for <code>cg.creator.orcid</code> and then re-generated the names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
<li><p>I added Phil Thornton and Sonal Henson&rsquo;s ORCID identifiers to the controlled vocabulary for <code>cg.creator.orcid</code> and then re-generated the names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p> </ul>
<pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq &gt; 2018-10-03-orcids.txt <pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq &gt; 2018-10-03-orcids.txt
$ ./resolve-orcids.py -i 2018-10-03-orcids.txt -o 2018-10-03-names.txt -d $ ./resolve-orcids.py -i 2018-10-03-orcids.txt -o 2018-10-03-names.txt -d
</code></pre></li> </code></pre><ul>
<li>I found a new corner case error that I need to check, given <em>and</em> family names deactivated:</li>
<li><p>I found a new corner case error that I need to check, given <em>and</em> family names deactivated:</p> </ul>
<pre><code>Looking up the names associated with ORCID iD: 0000-0001-7930-5752 <pre><code>Looking up the names associated with ORCID iD: 0000-0001-7930-5752
Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752 Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
</code></pre></li> </code></pre><ul>
<li>It appears to be Jim Lorenzen&hellip; I need to check that later!</li>
<li><p>It appears to be Jim Lorenzen&hellip; I need to check that later!</p></li> <li>I merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/390">#390</a>)</li>
<li>Linode sent another alert about CPU usage on CGSpace (linode18) this evening</li>
<li><p>I merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/390">#390</a>)</p></li> <li>It seems that Moayad is making quite a lot of requests today:</li>
</ul>
<li><p>Linode sent another alert about CPU usage on CGSpace (linode18) this evening</p></li>
<li><p>It seems that Moayad is making quite a lot of requests today:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Oct/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Oct/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1594 157.55.39.160 1594 157.55.39.160
1627 157.55.39.173 1627 157.55.39.173
1774 136.243.6.84 1774 136.243.6.84
4228 35.237.175.180 4228 35.237.175.180
4497 70.32.83.92 4497 70.32.83.92
4856 66.249.64.59 4856 66.249.64.59
7120 50.116.102.77 7120 50.116.102.77
12518 138.201.49.199 12518 138.201.49.199
87646 34.218.226.147 87646 34.218.226.147
111729 213.139.53.62 111729 213.139.53.62
</code></pre></li> </code></pre><ul>
<li>But in super positive news, he says they are using my new <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it's MUCH faster than using Atmire CUA's internal &ldquo;restlet&rdquo; API</li>
<li><p>But in super positive news, he says they are using my new <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it&rsquo;s MUCH faster than using Atmire CUA&rsquo;s internal &ldquo;restlet&rdquo; API</p></li> <li>I don't recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</li>
</ul>
<li><p>I don&rsquo;t recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</p>
<pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c <pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
8324 GET /bitstream 8324 GET /bitstream
4193 GET /handle 4193 GET /handle
</code></pre></li> </code></pre><ul>
<li>Suspiciously, it's only grabbing the CGIAR System Office community (handle prefix 10947):</li>
<li><p>Suspiciously, it&rsquo;s only grabbing the CGIAR System Office community (handle prefix 10947):</p> </ul>
<pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c <pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
7 GET /handle/10568 7 GET /handle/10568
4186 GET /handle/10947 4186 GET /handle/10947
</code></pre></li> </code></pre><ul>
<li>The user agent is suspicious too:</li>
<li><p>The user agent is suspicious too:</p> </ul>
<pre><code>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36 <pre><code>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
</code></pre></li> </code></pre><ul>
<li>It's clearly a bot and it's not re-using its Tomcat session, so I will add its IP to the nginx bad bot list</li>
<li><p>It&rsquo;s clearly a bot and it&rsquo;s not re-using its Tomcat session, so I will add its IP to the nginx bad bot list</p></li> <li>I looked in Solr's statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)&hellip; hmmm</li>
<li>I tagged all of Sonal and Phil's items with their ORCID identifiers on CGSpace using my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers.py</a> script:</li>
<li><p>I looked in Solr&rsquo;s statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)&hellip; hmmm</p></li> </ul>
<li><p>I tagged all of Sonal and Phil&rsquo;s items with their ORCID identifiers on CGSpace using my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers.py</a> script:</p>
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-10-03-add-orcids.csv -db dspace -u dspace -p 'fuuu' <pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-10-03-add-orcids.csv -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>Where <code>2018-10-03-add-orcids.csv</code> contained:</li>
<li><p>Where <code>2018-10-03-add-orcids.csv</code> contained:</p> </ul>
<pre><code>dc.contributor.author,cg.creator.id <pre><code>dc.contributor.author,cg.creator.id
&quot;Henson, Sonal P.&quot;,Sonal Henson: 0000-0002-2002-5462 &quot;Henson, Sonal P.&quot;,Sonal Henson: 0000-0002-2002-5462
&quot;Henson, S.&quot;,Sonal Henson: 0000-0002-2002-5462 &quot;Henson, S.&quot;,Sonal Henson: 0000-0002-2002-5462
@ -213,105 +193,75 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
&quot;Thornton, Philip K.&quot;,Philip Thornton: 0000-0002-1854-0182 &quot;Thornton, Philip K.&quot;,Philip Thornton: 0000-0002-1854-0182
&quot;Thornton, Phillip&quot;,Philip Thornton: 0000-0002-1854-0182 &quot;Thornton, Phillip&quot;,Philip Thornton: 0000-0002-1854-0182
&quot;Thornton, Phillip K.&quot;,Philip Thornton: 0000-0002-1854-0182 &quot;Thornton, Phillip K.&quot;,Philip Thornton: 0000-0002-1854-0182
</code></pre></li> </code></pre><h2 id="20181004">2018-10-04</h2>
</ul>
<h2 id="2018-10-04">2018-10-04</h2>
<ul> <ul>
<li>Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)</li> <li>Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)</li>
<li>Every item has at least a <code>LICENSE</code> bundle, and some have a <code>THUMBNAIL</code> bundle, but the indexing code is specifically checking for downloads from the <code>ORIGINAL</code> bundle <li>Every item has at least a <code>LICENSE</code> bundle, and some have a <code>THUMBNAIL</code> bundle, but the indexing code is specifically checking for downloads from the <code>ORIGINAL</code> bundle
<ul> <ul>
<li><a href="https://cgspace.cgiar.org/handle/10568/97460"><sup>10568</sup>&frasl;<sub>97460</sub></a> (100550): has a thumbnail bitstream</li> <li><a href="https://cgspace.cgiar.org/handle/10568/97460">10568/97460</a> (100550): has a thumbnail bitstream</li>
<li><a href="https://cgspace.cgiar.org/handle/10568/96112"><sup>10568</sup>&frasl;<sub>96112</sub></a> (96736): has only a LICENSE bitstream</li> <li><a href="https://cgspace.cgiar.org/handle/10568/96112">10568/96112</a> (96736): has only a LICENSE bitstream</li>
</ul></li> </ul>
</li>
<li>I see there are other bundles we might need to pay attention to: <code>TEXT</code>, <code>@_LOGO-COLLECTION_@</code>, <code>@_LOGO-COMMUNITY_@</code>, etc&hellip;</li> <li>I see there are other bundles we might need to pay attention to: <code>TEXT</code>, <code>@_LOGO-COLLECTION_@</code>, <code>@_LOGO-COMMUNITY_@</code>, etc&hellip;</li>
<li>On a hunch I dropped the statistics table and re-indexed and now those two items above have no downloads</li> <li>On a hunch I dropped the statistics table and re-indexed and now those two items above have no downloads</li>
<li>So it&rsquo;s fixed, but I&rsquo;m not sure why!</li> <li>So it's fixed, but I'm not sure why!</li>
<li>Peter wants to know the number of API requests per month, which was about 250,000 in September (exluding statlet requests):</li>
<li><p>Peter wants to know the number of API requests per month, which was about 250,000 in September (exluding statlet requests):</p> </ul>
<pre><code># zcat --force /var/log/nginx/{oai,rest}.log* | grep -E 'Sep/2018' | grep -c -v 'statlets' <pre><code># zcat --force /var/log/nginx/{oai,rest}.log* | grep -E 'Sep/2018' | grep -c -v 'statlets'
251226 251226
</code></pre></li> </code></pre><ul>
<li>I found a logic error in the dspace-statistics-api <code>indexer.py</code> script that was causing item views to be inserted into downloads</li>
<li><p>I found a logic error in the dspace-statistics-api <code>indexer.py</code> script that was causing item views to be inserted into downloads</p></li> <li>I tagged version 0.4.2 of the tool and redeployed it on CGSpace</li>
<li><p>I tagged version 0.4.2 of the tool and redeployed it on CGSpace</p></li>
</ul> </ul>
<h2 id="20181005">2018-10-05</h2>
<h2 id="2018-10-05">2018-10-05</h2>
<ul> <ul>
<li>Meet with Peter, Abenet, and Sisay to discuss CGSpace meeting in Nairobi and Sisay&rsquo;s work plan</li> <li>Meet with Peter, Abenet, and Sisay to discuss CGSpace meeting in Nairobi and Sisay's work plan</li>
<li>We agreed that he would do monthly updates of the controlled vocabularies and generate a new one for the top 1,000 AGROVOC terms</li> <li>We agreed that he would do monthly updates of the controlled vocabularies and generate a new one for the top 1,000 AGROVOC terms</li>
<li>Add a link to <a href="https://cgspace.cgiar.org/explorer/">AReS explorer</a> to the CGSpace homepage introduction text</li> <li>Add a link to <a href="https://cgspace.cgiar.org/explorer/">AReS explorer</a> to the CGSpace homepage introduction text</li>
</ul> </ul>
<h2 id="20181006">2018-10-06</h2>
<h2 id="2018-10-06">2018-10-06</h2>
<ul> <ul>
<li>Follow up with AgriKnowledge about including Handle links (<code>dc.identifier.uri</code>) on their item pages</li> <li>Follow up with AgriKnowledge about including Handle links (<code>dc.identifier.uri</code>) on their item pages</li>
<li>In July, 2018 they had said their programmers would include the field in the next update of their website software</li> <li>In July, 2018 they had said their programmers would include the field in the next update of their website software</li>
<li><a href="https://repository.cimmyt.org/">CIMMYT&rsquo;s DSpace repository</a> is now running DSpace 5.x!</li> <li><a href="https://repository.cimmyt.org/">CIMMYT's DSpace repository</a> is now running DSpace 5.x!</li>
<li>It&rsquo;s running OAI, but not REST, so I need to talk to Richard about that!</li> <li>It's running OAI, but not REST, so I need to talk to Richard about that!</li>
</ul> </ul>
<h2 id="20181008">2018-10-08</h2>
<h2 id="2018-10-08">2018-10-08</h2>
<ul> <ul>
<li>AgriKnowledge says they&rsquo;re going to add the <code>dc.identifier.uri</code> to their item view in November when they update their website software</li> <li>AgriKnowledge says they're going to add the <code>dc.identifier.uri</code> to their item view in November when they update their website software</li>
</ul> </ul>
<h2 id="20181010">2018-10-10</h2>
<h2 id="2018-10-10">2018-10-10</h2>
<ul> <ul>
<li>Peter noticed that some recently added PDFs don&rsquo;t have thumbnails</li> <li>Peter noticed that some recently added PDFs don't have thumbnails</li>
<li>When I tried to force them to be generated I got an error that I've never seen before:</li>
<li><p>When I tried to force them to be generated I got an error that I&rsquo;ve never seen before:</p> </ul>
<pre><code>$ dspace filter-media -v -f -i 10568/97613 <pre><code>$ dspace filter-media -v -f -i 10568/97613
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: not authorized `/tmp/impdfthumb5039464037201498062.pdf' @ error/constitute.c/ReadImage/412. org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: not authorized `/tmp/impdfthumb5039464037201498062.pdf' @ error/constitute.c/ReadImage/412.
</code></pre></li> </code></pre><ul>
<li>I see there was an update to Ubuntu's ImageMagick on 2018-10-05, so maybe something changed or broke?</li>
<li><p>I see there was an update to Ubuntu&rsquo;s ImageMagick on 2018-10-05, so maybe something changed or broke?</p></li> <li>I get the same error when forcing <code>filter-media</code> to run on DSpace Test too, so it's gotta be an ImageMagic bug</li>
<li>The ImageMagick version is currently 8:6.8.9.9-7ubuntu5.13, and there is an <a href="https://usn.ubuntu.com/3785-1/">Ubuntu Security Notice from 2018-10-04</a></li>
<li><p>I get the same error when forcing <code>filter-media</code> to run on DSpace Test too, so it&rsquo;s gotta be an ImageMagic bug</p></li> <li>Wow, someone on <a href="https://twitter.com/rosscampbell/status/1048268966819319808">Twitter posted about this breaking his web application</a> (and it was retweeted by the ImageMagick acount!)</li>
<li>I commented out the line that disables PDF thumbnails in <code>/etc/ImageMagick-6/policy.xml</code>:</li>
<li><p>The ImageMagick version is currently 8:6.8.9.9-7ubuntu5.13, and there is an <a href="https://usn.ubuntu.com/3785-1/">Ubuntu Security Notice from 2018-10-04</a></p></li>
<li><p>Wow, someone on <a href="https://twitter.com/rosscampbell/status/1048268966819319808">Twitter posted about this breaking his web application</a> (and it was retweeted by the ImageMagick acount!)</p></li>
<li><p>I commented out the line that disables PDF thumbnails in <code>/etc/ImageMagick-6/policy.xml</code>:</p>
<pre><code>&lt;!--&lt;policy domain=&quot;coder&quot; rights=&quot;none&quot; pattern=&quot;PDF&quot; /&gt;--&gt;
</code></pre></li>
<li><p>This works, but I&rsquo;m not sure what ImageMagick&rsquo;s long-term plan is if they are going to disable ALL image formats&hellip;</p></li>
<li><p>I suppose I need to enable a workaround for this in Ansible?</p></li>
</ul> </ul>
<pre><code> &lt;!--&lt;policy domain=&quot;coder&quot; rights=&quot;none&quot; pattern=&quot;PDF&quot; /&gt;--&gt;
<h2 id="2018-10-11">2018-10-11</h2> </code></pre><ul>
<li>This works, but I'm not sure what ImageMagick's long-term plan is if they are going to disable ALL image formats&hellip;</li>
<li>I suppose I need to enable a workaround for this in Ansible?</li>
</ul>
<h2 id="20181011">2018-10-11</h2>
<ul> <ul>
<li>I emailed DuraSpace to update <a href="https://duraspace.org/registry/entry/4188/?gvid=178">our entry in their DSpace registry</a> (the data was still on DSpace 3, JSPUI, etc)</li> <li>I emailed DuraSpace to update <a href="https://duraspace.org/registry/entry/4188/?gvid=178">our entry in their DSpace registry</a> (the data was still on DSpace 3, JSPUI, etc)</li>
<li>Generate a list of the top 1500 values for <code>dc.subject</code> so Sisay can start making a controlled vocabulary for it:</li>
<li><p>Generate a list of the top 1500 values for <code>dc.subject</code> so Sisay can start making a controlled vocabulary for it:</p> </ul>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-10-11-top-1500-subject.csv WITH CSV HEADER; <pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-10-11-top-1500-subject.csv WITH CSV HEADER;
COPY 1500 COPY 1500
</code></pre></li> </code></pre><ul>
<li>Give WorldFish advice about Handles because they are talking to some company called KnowledgeArc who recommends they do not use Handles!</li>
<li><p>Give WorldFish advice about Handles because they are talking to some company called KnowledgeArc who recommends they do not use Handles!</p></li> <li>Last week I emailed Altmetric to ask if their software would notice mentions of our Handle in the format &ldquo;handle:10568/80775&rdquo; because I noticed that the <a href="https://landportal.org/library/resources/handle1056880775/unlocking-farming-potential-bangladesh%E2%80%99-polders">Land Portal does this</a></li>
<li>Altmetric support responded to say no, but the reason is that Land Portal is doing even more strange stuff by not using <code>&lt;meta&gt;</code> tags in their page header, and using &ldquo;dct:identifier&rdquo; property instead of &ldquo;dc:identifier&rdquo;</li>
<li><p>Last week I emailed Altmetric to ask if their software would notice mentions of our Handle in the format &ldquo;handle:<sup>10568</sup>&frasl;<sub>80775</sub>&rdquo; because I noticed that the <a href="https://landportal.org/library/resources/handle1056880775/unlocking-farming-potential-bangladesh%E2%80%99-polders">Land Portal does this</a></p></li> <li>I re-created my local DSpace databse container using <a href="https://github.com/containers/libpod">podman</a> instead of Docker:</li>
</ul>
<li><p>Altmetric support responded to say no, but the reason is that Land Portal is doing even more strange stuff by not using <code>&lt;meta&gt;</code> tags in their page header, and using &ldquo;dct:identifier&rdquo; property instead of &ldquo;dc:identifier&rdquo;</p></li>
<li><p>I re-created my local DSpace databse container using <a href="https://github.com/containers/libpod">podman</a> instead of Docker:</p>
<pre><code>$ mkdir -p ~/.local/lib/containers/volumes/dspacedb_data <pre><code>$ mkdir -p ~/.local/lib/containers/volumes/dspacedb_data
$ sudo podman create --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine $ sudo podman create --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
$ sudo podman start dspacedb $ sudo podman start dspacedb
@ -321,106 +271,80 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
</code></pre></li> </code></pre><ul>
<li>I tried to make an Artifactory in podman, but it seems to have problems because Artifactory is distributed on the Bintray repository</li>
<li><p>I tried to make an Artifactory in podman, but it seems to have problems because Artifactory is distributed on the Bintray repository</p></li> <li>I can pull the <code>docker.bintray.io/jfrog/artifactory-oss:latest</code> image, but not start it</li>
<li>I decided to use a Sonatype Nexus repository instead:</li>
<li><p>I can pull the <code>docker.bintray.io/jfrog/artifactory-oss:latest</code> image, but not start it</p></li> </ul>
<li><p>I decided to use a Sonatype Nexus repository instead:</p>
<pre><code>$ mkdir -p ~/.local/lib/containers/volumes/nexus_data <pre><code>$ mkdir -p ~/.local/lib/containers/volumes/nexus_data
$ sudo podman run --name nexus -d -v /home/aorth/.local/lib/containers/volumes/nexus_data:/nexus_data -p 8081:8081 sonatype/nexus3 $ sudo podman run --name nexus -d -v /home/aorth/.local/lib/containers/volumes/nexus_data:/nexus_data -p 8081:8081 sonatype/nexus3
</code></pre></li> </code></pre><ul>
<li>With a few changes to my local Maven <code>settings.xml</code> it is working well</li>
<li><p>With a few changes to my local Maven <code>settings.xml</code> it is working well</p></li> <li>Generate a list of the top 10,000 authors for Peter Ballantyne to look through:</li>
</ul>
<li><p>Generate a list of the top 10,000 authors for Peter Ballantyne to look through:</p>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 3 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 10000) to /tmp/2018-10-11-top-10000-authors.csv WITH CSV HEADER; <pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 3 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 10000) to /tmp/2018-10-11-top-10000-authors.csv WITH CSV HEADER;
COPY 10000 COPY 10000
</code></pre></li> </code></pre><ul>
<li>CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections</li>
<li><p>CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections</p></li> <li>I decided to constrain the max height of these to 200px using CSS (<a href="https://github.com/ilri/DSpace/pull/392">#392</a>)</li>
<li><p>I decided to constrain the max height of these to 200px using CSS (<a href="https://github.com/ilri/DSpace/pull/392">#392</a>)</p></li>
</ul> </ul>
<h2 id="20181013">2018-10-13</h2>
<h2 id="2018-10-13">2018-10-13</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li> <li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Look through Peter&rsquo;s list of 746 author corrections in OpenRefine</li> <li>Look through Peter's list of 746 author corrections in OpenRefine</li>
<li>I first facet by blank, trim whitespace, and then check for weird characters that might be indicative of encoding issues with this GREL:</li>
<li><p>I first facet by blank, trim whitespace, and then check for weird characters that might be indicative of encoding issues with this GREL:</p>
<pre><code>or(
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/))
)
</code></pre></li>
<li><p>Then I exported and applied them on my local test server:</p>
<pre><code>$ ./fix-metadata-values.py -i 2018-10-11-top-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t CORRECT -m 3
</code></pre></li>
<li><p>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay&rsquo;s author controlled vocabulary</p></li>
</ul> </ul>
<pre><code>or(
<h2 id="2018-10-14">2018-10-14</h2> isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/))
)
</code></pre><ul>
<li>Then I exported and applied them on my local test server:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i 2018-10-11-top-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t CORRECT -m 3
</code></pre><ul>
<li>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay's author controlled vocabulary</li>
</ul>
<h2 id="20181014">2018-10-14</h2>
<ul> <ul>
<li>Merge the authors controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/393">#393</a>), usage rights (<a href="https://github.com/ilri/DSpace/pull/394">#394</a>), and the upstream DSpace 5.x cherry-picks (<a href="https://github.com/ilri/DSpace/pull/395">#394</a>) into our <code>5_x-prod</code> branch</li> <li>Merge the authors controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/393">#393</a>), usage rights (<a href="https://github.com/ilri/DSpace/pull/394">#394</a>), and the upstream DSpace 5.x cherry-picks (<a href="https://github.com/ilri/DSpace/pull/395">#394</a>) into our <code>5_x-prod</code> branch</li>
<li>Switch to new CGIAR LDAP server on CGSpace, as it&rsquo;s been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)</li> <li>Switch to new CGIAR LDAP server on CGSpace, as it's been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)</li>
<li>Apply Peter's 746 author corrections on CGSpace and DSpace Test using my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</li>
<li><p>Apply Peter&rsquo;s 746 author corrections on CGSpace and DSpace Test using my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-10-11-top-authors.csv -f dc.contributor.author -t CORRECT -m 3 -db dspace -u dspace -p 'fuuu' <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-10-11-top-authors.csv -f dc.contributor.author -t CORRECT -m 3 -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>Run all system updates on CGSpace (linode19) and reboot the server</li>
<li><p>Run all system updates on CGSpace (linode19) and reboot the server</p></li> <li>After rebooting the server I noticed that Handles are not resolving, and the <code>dspace-handle-server</code> systemd service is not running (or rather, it exited with success)</li>
<li>Restarting the service with systemd works for a few seconds, then the java process quits</li>
<li><p>After rebooting the server I noticed that Handles are not resolving, and the <code>dspace-handle-server</code> systemd service is not running (or rather, it exited with success)</p></li> <li>I suspect that the systemd service type needs to be <code>forking</code> rather than <code>simple</code>, because the service calls the default DSpace <code>start-handle-server</code> shell script, which uses <code>nohup</code> and <code>&amp;</code> to background the java process</li>
<li>It would be nice if there was a cleaner way to start the service and then just log to the systemd journal rather than all this hiding and log redirecting</li>
<li><p>Restarting the service with systemd works for a few seconds, then the java process quits</p></li> <li>Email the Landportal.org people to ask if they would consider Dublin Core metadata tags in their page's header, rather than the HTML properties they are using in their body</li>
<li>Peter pointed out that some thumbnails were still not getting generated
<li><p>I suspect that the systemd service type needs to be <code>forking</code> rather than <code>simple</code>, because the service calls the default DSpace <code>start-handle-server</code> shell script, which uses <code>nohup</code> and <code>&amp;</code> to background the java process</p></li>
<li><p>It would be nice if there was a cleaner way to start the service and then just log to the systemd journal rather than all this hiding and log redirecting</p></li>
<li><p>Email the Landportal.org people to ask if they would consider Dublin Core metadata tags in their page&rsquo;s header, rather than the HTML properties they are using in their body</p></li>
<li><p>Peter pointed out that some thumbnails were still not getting generated</p>
<ul> <ul>
<li>When I tried to generate them manually I noticed that the path to the CMYK profile had changed because Ubuntu upgraded Ghostscript from 9.18 to 9.25 last week&hellip; WTF?</li> <li>When I tried to generate them manually I noticed that the path to the CMYK profile had changed because Ubuntu upgraded Ghostscript from 9.18 to 9.25 last week&hellip; WTF?</li>
<li>Looks like I can use <code>/usr/share/ghostscript/current</code> instead of <code>/usr/share/ghostscript/9.25</code>&hellip;</li> <li>Looks like I can use <code>/usr/share/ghostscript/current</code> instead of <code>/usr/share/ghostscript/9.25</code>&hellip;</li>
</ul></li>
<li><p>I limited the tall thumbnails even further to 170px because Peter said CTA&rsquo;s were still too tall at 200px (<a href="https://github.com/ilri/DSpace/pull/396">#396</a>)</p></li>
</ul> </ul>
</li>
<h2 id="2018-10-15">2018-10-15</h2> <li>I limited the tall thumbnails even further to 170px because Peter said CTA's were still too tall at 200px (<a href="https://github.com/ilri/DSpace/pull/396">#396</a>)</li>
</ul>
<h2 id="20181015">2018-10-15</h2>
<ul> <ul>
<li>Tomcat on DSpace Test (linode19) has somehow stopped running all the DSpace applications</li> <li>Tomcat on DSpace Test (linode19) has somehow stopped running all the DSpace applications</li>
<li>I don&rsquo;t see anything in the Catalina logs or <code>dmesg</code>, and the Tomcat manager shows XMLUI, REST, OAI, etc all &ldquo;Running: false&rdquo;</li> <li>I don't see anything in the Catalina logs or <code>dmesg</code>, and the Tomcat manager shows XMLUI, REST, OAI, etc all &ldquo;Running: false&rdquo;</li>
<li>Actually, now I remember that yesterday when I deployed the latest changes from git on DSpace Test I noticed a syntax error in one XML file when I was doing the discovery reindexing</li> <li>Actually, now I remember that yesterday when I deployed the latest changes from git on DSpace Test I noticed a syntax error in one XML file when I was doing the discovery reindexing</li>
<li>I fixed it so that I could reindex, but I guess the rest of DSpace actually didn&rsquo;t start up&hellip;</li> <li>I fixed it so that I could reindex, but I guess the rest of DSpace actually didn't start up&hellip;</li>
<li>Create an account on DSpace Test for Felix from Earlham so he can test COPO submission <li>Create an account on DSpace Test for Felix from Earlham so he can test COPO submission
<ul> <ul>
<li>I created a new collection and added him as the administrator so he can test submission</li> <li>I created a new collection and added him as the administrator so he can test submission</li>
<li>He said he actually wants to test creation of communities, collections, etc, so I had to make him a super admin for now</li> <li>He said he actually wants to test creation of communities, collections, etc, so I had to make him a super admin for now</li>
<li>I told him we need to think about the workflow more seriously in the future</li> <li>I told him we need to think about the workflow more seriously in the future</li>
</ul></li> </ul>
</li>
<li><p>I ended up having some issues with podman and went back to Docker, so I had to re-create my containers:</p> <li>I ended up having some issues with podman and went back to Docker, so I had to re-create my containers:</li>
</ul>
<pre><code>$ sudo docker run --name nexus --network dspace-build -d -v /home/aorth/.local/lib/containers/volumes/nexus_data:/nexus_data -p 8081:8081 sonatype/nexus3 <pre><code>$ sudo docker run --name nexus --network dspace-build -d -v /home/aorth/.local/lib/containers/volumes/nexus_data:/nexus_data -p 8081:8081 sonatype/nexus3
$ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine $ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
$ createuser -h localhost -U postgres --pwprompt dspacetest $ createuser -h localhost -U postgres --pwprompt dspacetest
@ -429,21 +353,15 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
</code></pre></li> </code></pre><h2 id="20181016">2018-10-16</h2>
</ul>
<h2 id="2018-10-16">2018-10-16</h2>
<ul> <ul>
<li><p>Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:</p> <li>Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:</li>
</ul>
<pre><code>dspace=# \copy (SELECT (CASE when metadata_schema_id=1 THEN 'dc' WHEN metadata_schema_id=2 THEN 'cg' END) AS schema, element, qualifier, scope_note FROM metadatafieldregistry where metadata_schema_id IN (1,2)) TO /tmp/cgspace-schema.csv WITH CSV HEADER; <pre><code>dspace=# \copy (SELECT (CASE when metadata_schema_id=1 THEN 'dc' WHEN metadata_schema_id=2 THEN 'cg' END) AS schema, element, qualifier, scope_note FROM metadatafieldregistry where metadata_schema_id IN (1,2)) TO /tmp/cgspace-schema.csv WITH CSV HEADER;
</code></pre></li> </code></pre><ul>
<li>Talking to the CodeObia guys about the REST API I started to wonder why it's so slow and how I can quantify it in order to ask the dspace-tech mailing list for help profiling it</li>
<li><p>Talking to the CodeObia guys about the REST API I started to wonder why it&rsquo;s so slow and how I can quantify it in order to ask the dspace-tech mailing list for help profiling it</p></li> <li>Interestingly, the speed doesn't get better after you request the same thing multiple timesit's consistently bad on both CGSpace and DSpace Test!</li>
</ul>
<li><p>Interestingly, the speed doesn&rsquo;t get better after you request the same thing multiple timesit&rsquo;s consistently bad on both CGSpace and DSpace Test!</p>
<pre><code>$ time http --print h 'https://cgspace.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&amp;limit=100&amp;offset=0' <pre><code>$ time http --print h 'https://cgspace.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&amp;limit=100&amp;offset=0'
... ...
0.35s user 0.06s system 1% cpu 25.133 total 0.35s user 0.06s system 1% cpu 25.133 total
@ -459,14 +377,11 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
0.23s user 0.04s system 1% cpu 16.460 total 0.23s user 0.04s system 1% cpu 16.460 total
0.24s user 0.04s system 1% cpu 21.043 total 0.24s user 0.04s system 1% cpu 21.043 total
0.22s user 0.04s system 1% cpu 17.132 total 0.22s user 0.04s system 1% cpu 17.132 total
</code></pre></li> </code></pre><ul>
<li>I should note that at this time CGSpace is using Oracle Java and DSpace Test is using OpenJDK (both version 8)</li>
<li><p>I should note that at this time CGSpace is using Oracle Java and DSpace Test is using OpenJDK (both version 8)</p></li> <li>I wonder if the Java garbage collector is important here, or if there are missing indexes in PostgreSQL?</li>
<li>I switched DSpace Test to the G1GC garbage collector and tried again and now the results are worse!</li>
<li><p>I wonder if the Java garbage collector is important here, or if there are missing indexes in PostgreSQL?</p></li> </ul>
<li><p>I switched DSpace Test to the G1GC garbage collector and tried again and now the results are worse!</p>
<pre><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&amp;limit=100&amp;offset=0' <pre><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&amp;limit=100&amp;offset=0'
... ...
0.20s user 0.03s system 0% cpu 25.017 total 0.20s user 0.03s system 0% cpu 25.017 total
@ -474,29 +389,24 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
0.24s user 0.02s system 1% cpu 22.496 total 0.24s user 0.02s system 1% cpu 22.496 total
0.22s user 0.03s system 1% cpu 22.720 total 0.22s user 0.03s system 1% cpu 22.720 total
0.23s user 0.03s system 1% cpu 22.632 total 0.23s user 0.03s system 1% cpu 22.632 total
</code></pre></li> </code></pre><ul>
<li>If I make a request without the expands it is ten time faster:</li>
<li><p>If I make a request without the expands it is ten time faster:</p> </ul>
<pre><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?limit=100&amp;offset=0' <pre><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?limit=100&amp;offset=0'
... ...
0.20s user 0.03s system 7% cpu 3.098 total 0.20s user 0.03s system 7% cpu 3.098 total
0.22s user 0.03s system 8% cpu 2.896 total 0.22s user 0.03s system 8% cpu 2.896 total
0.21s user 0.05s system 9% cpu 2.787 total 0.21s user 0.05s system 9% cpu 2.787 total
0.23s user 0.02s system 8% cpu 2.896 total 0.23s user 0.02s system 8% cpu 2.896 total
</code></pre></li> </code></pre><ul>
<li>I sent a mail to dspace-tech to ask how to profile this&hellip;</li>
<li><p>I sent a mail to dspace-tech to ask how to profile this&hellip;</p></li>
</ul> </ul>
<h2 id="20181017">2018-10-17</h2>
<h2 id="2018-10-17">2018-10-17</h2>
<ul> <ul>
<li>I decided to update most of the existing metadata values that we have in <code>dc.rights</code> on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included)</li> <li>I decided to update most of the existing metadata values that we have in <code>dc.rights</code> on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included)</li>
<li>Most of the are from Bioversity, and I asked Maria for permission before updating them</li> <li>Most of the are from Bioversity, and I asked Maria for permission before updating them</li>
<li>I manually went through and looked at the existing values and updated them in several batches:</li>
<li><p>I manually went through and looked at the existing values and updated them in several batches:</p> </ul>
<pre><code>UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %'; <pre><code>UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %';
UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%'; UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%';
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%'; UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%';
@ -513,115 +423,89 @@ UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND met
UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184; UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184;
UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%'; UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%';
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564; UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564;
</code></pre></li> </code></pre><ul>
<li>I updated the fields on CGSpace and then started a re-index of Discovery</li>
<li><p>I updated the fields on CGSpace and then started a re-index of Discovery</p></li> <li>We also need to re-think the <code>dc.rights</code> field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field)</li>
<li>Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server</li>
<li><p>We also need to re-think the <code>dc.rights</code> field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field)</p></li> <li>IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script, and regenerated the controlled vocabulary:</li>
</ul>
<li><p>Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server</p></li>
<li><p>IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script, and regenerated the controlled vocabulary:</p>
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; <pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt;
2018-10-17-orcids.txt 2018-10-17-orcids.txt
$ ./resolve-orcids.py -i 2018-10-17-orcids.txt -o 2018-10-17-names.txt -d $ ./resolve-orcids.py -i 2018-10-17-orcids.txt -o 2018-10-17-names.txt -d
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
</code></pre></li> </code></pre><ul>
<li>I also decided to add the ORCID identifiers that MEL had sent us a few months ago&hellip;</li>
<li><p>I also decided to add the ORCID identifiers that MEL had sent us a few months ago&hellip;</p></li> <li>One problem I had with the <code>resolve-orcids.py</code> script is that one user seems to have disabled their profile data since we last updated:</li>
</ul>
<li><p>One problem I had with the <code>resolve-orcids.py</code> script is that one user seems to have disabled their profile data since we last updated:</p>
<pre><code>Looking up the names associated with ORCID iD: 0000-0001-7930-5752 <pre><code>Looking up the names associated with ORCID iD: 0000-0001-7930-5752
Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752 Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
</code></pre></li> </code></pre><ul>
<li>So I need to handle that situation in the script for sure, but I'm not sure what to do organizationally or ethically, since that user disabled their name! Do we remove him from the list?</li>
<li><p>So I need to handle that situation in the script for sure, but I&rsquo;m not sure what to do organizationally or ethically, since that user disabled their name! Do we remove him from the list?</p></li> <li>I made a pull request and merged the ORCID updates into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/397">#397</a>)</li>
<li>Improve the logic of name checking in my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script</li>
<li><p>I made a pull request and merged the ORCID updates into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/397">#397</a>)</p></li>
<li><p>Improve the logic of name checking in my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script</p></li>
</ul> </ul>
<h2 id="20181018">2018-10-18</h2>
<h2 id="2018-10-18">2018-10-18</h2>
<ul> <ul>
<li>I granted MEL&rsquo;s deposit user admin access to IITA, CIP, Bioversity, and RTB communities on DSpace Test so they can start testing real depositing</li> <li>I granted MEL's deposit user admin access to IITA, CIP, Bioversity, and RTB communities on DSpace Test so they can start testing real depositing</li>
<li>After they do some tests and we check the values Enrico will send a formal email to Peter et al to ask that they start depositing officially</li> <li>After they do some tests and we check the values Enrico will send a formal email to Peter et al to ask that they start depositing officially</li>
<li>I upgraded PostgreSQL to 9.6 on DSpace Test using Ansible, then had to manually <a href="https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian">migrate from 9.5 to 9.6</a>:</li>
<li><p>I upgraded PostgreSQL to 9.6 on DSpace Test using Ansible, then had to manually <a href="https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian">migrate from 9.5 to 9.6</a>:</p> </ul>
<pre><code># su - postgres <pre><code># su - postgres
$ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/lib/postgresql/9.6/bin -d /var/lib/postgresql/9.5/main -D /var/lib/postgresql/9.6/main -o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' -O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf' $ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/lib/postgresql/9.6/bin -d /var/lib/postgresql/9.5/main -D /var/lib/postgresql/9.6/main -o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' -O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'
$ exit $ exit
# systemctl start postgresql # systemctl start postgresql
# dpkg -r postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5 # dpkg -r postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
</code></pre></li> </code></pre><h2 id="20181019">2018-10-19</h2>
</ul>
<h2 id="2018-10-19">2018-10-19</h2>
<ul> <ul>
<li>Help Francesca from Bioversity generate a report about items they uploaded in 2015 through 2018</li> <li>Help Francesca from Bioversity generate a report about items they uploaded in 2015 through 2018</li>
<li>Linode emailed me to say that CGSpace (linode18) had high CPU usage for a few hours this afternoon</li> <li>Linode emailed me to say that CGSpace (linode18) had high CPU usage for a few hours this afternoon</li>
<li>Looking at the nginx logs around that time I see the following IPs making the most requests:</li>
<li><p>Looking at the nginx logs around that time I see the following IPs making the most requests:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;19/Oct/2018:(12|13|14|15)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
361 207.46.13.179
395 181.115.248.74
485 66.249.64.93
535 157.55.39.213
536 157.55.39.99
551 34.218.226.147
580 157.55.39.173
1516 35.237.175.180
1629 66.249.64.91
1758 5.9.6.51
</code></pre></li>
<li><p>5.9.6.51 is MegaIndex, which I&rsquo;ve seen before&hellip;</p></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;19/Oct/2018:(12|13|14|15)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<h2 id="2018-10-20">2018-10-20</h2> 361 207.46.13.179
395 181.115.248.74
485 66.249.64.93
535 157.55.39.213
536 157.55.39.99
551 34.218.226.147
580 157.55.39.173
1516 35.237.175.180
1629 66.249.64.91
1758 5.9.6.51
</code></pre><ul>
<li>5.9.6.51 is MegaIndex, which I've seen before&hellip;</li>
</ul>
<h2 id="20181020">2018-10-20</h2>
<ul> <ul>
<li>I was going to try to run Solr in Docker because I learned I can run Docker on Travis-CI (for testing my dspace-statistics-api), but the oldest official Solr images are for 5.5, and DSpace&rsquo;s Solr configuration is for 4.9</li> <li>I was going to try to run Solr in Docker because I learned I can run Docker on Travis-CI (for testing my dspace-statistics-api), but the oldest official Solr images are for 5.5, and DSpace's Solr configuration is for 4.9</li>
<li>This means our existing Solr configuration doesn't run in Solr 5.5:</li>
<li><p>This means our existing Solr configuration doesn&rsquo;t run in Solr 5.5:</p> </ul>
<pre><code>$ sudo docker pull solr:5 <pre><code>$ sudo docker pull solr:5
$ sudo docker run --name my_solr -v ~/dspace/solr/statistics/conf:/tmp/conf -d -p 8983:8983 -t solr:5 $ sudo docker run --name my_solr -v ~/dspace/solr/statistics/conf:/tmp/conf -d -p 8983:8983 -t solr:5
$ sudo docker logs my_solr $ sudo docker logs my_solr
... ...
ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics] Caused by: solr.IntField ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics] Caused by: solr.IntField
</code></pre></li> </code></pre><ul>
<li>Apparently a bunch of variable types were removed in <a href="https://issues.apache.org/jira/browse/SOLR-5936">Solr 5</a></li>
<li><p>Apparently a bunch of variable types were removed in <a href="https://issues.apache.org/jira/browse/SOLR-5936">Solr 5</a></p></li> <li>So for now it's actually a huge pain in the ass to run the tests for my dspace-statistics-api</li>
<li>Linode sent a message that the CPU usage was high on CGSpace (linode18) last night</li>
<li><p>So for now it&rsquo;s actually a huge pain in the ass to run the tests for my dspace-statistics-api</p></li> <li>According to the nginx logs around that time it was 5.9.6.51 (MegaIndex) again:</li>
</ul>
<li><p>Linode sent a message that the CPU usage was high on CGSpace (linode18) last night</p></li>
<li><p>According to the nginx logs around that time it was 5.9.6.51 (MegaIndex) again:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Oct/2018:(14|15|16)&quot; | awk '{print $1}' | sort <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Oct/2018:(14|15|16)&quot; | awk '{print $1}' | sort
| uniq -c | sort -n | tail -n 10 | uniq -c | sort -n | tail -n 10
249 207.46.13.179 249 207.46.13.179
250 157.55.39.173 250 157.55.39.173
301 54.166.207.223 301 54.166.207.223
303 157.55.39.213 303 157.55.39.213
310 66.249.64.95 310 66.249.64.95
362 34.218.226.147 362 34.218.226.147
381 66.249.64.93 381 66.249.64.93
415 35.237.175.180 415 35.237.175.180
1205 66.249.64.91 1205 66.249.64.91
1227 5.9.6.51 1227 5.9.6.51
</code></pre></li> </code></pre><ul>
<li>This bot is only using the XMLUI and it does <em>not</em> seem to be re-using its sessions:</li>
<li><p>This bot is only using the XMLUI and it does <em>not</em> seem to be re-using its sessions:</p> </ul>
<pre><code># grep -c 5.9.6.51 /var/log/nginx/*.log <pre><code># grep -c 5.9.6.51 /var/log/nginx/*.log
/var/log/nginx/access.log:9323 /var/log/nginx/access.log:9323
/var/log/nginx/error.log:0 /var/log/nginx/error.log:0
@ -631,69 +515,51 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
/var/log/nginx/statistics.log:0 /var/log/nginx/statistics.log:0
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-10-20 | sort | uniq # grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-10-20 | sort | uniq
8915 8915
</code></pre></li> </code></pre><ul>
<li>Last month I added &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager Valve's regular expression matching, and it seems to be working for MegaIndex's user agent:</li>
<li><p>Last month I added &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager Valve&rsquo;s regular expression matching, and it seems to be working for MegaIndex&rsquo;s user agent:</p>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1' User-Agent:'&quot;Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)&quot;'
</code></pre></li>
<li><p>So I&rsquo;m not sure why this bot uses so many sessionsis it because it requests very slowly?</p></li>
</ul> </ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1' User-Agent:'&quot;Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)&quot;'
<h2 id="2018-10-21">2018-10-21</h2> </code></pre><ul>
<li>So I'm not sure why this bot uses so many sessionsis it because it requests very slowly?</li>
</ul>
<h2 id="20181021">2018-10-21</h2>
<ul> <ul>
<li>Discuss AfricaRice joining CGSpace</li> <li>Discuss AfricaRice joining CGSpace</li>
</ul> </ul>
<h2 id="20181022">2018-10-22</h2>
<h2 id="2018-10-22">2018-10-22</h2>
<ul> <ul>
<li>Post message to Yammer about usage rights (dc.rights)</li> <li>Post message to Yammer about usage rights (dc.rights)</li>
<li>Change <code>build.properties</code> to use HTTPS for Handles in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li> <li>Change <code>build.properties</code> to use HTTPS for Handles in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
<li>We will still need to do a batch update of the <code>dc.identifier.uri</code> and other fields in the database:</li>
<li><p>We will still need to do a batch update of the <code>dc.identifier.uri</code> and other fields in the database:</p> </ul>
<pre><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%'; <pre><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
</code></pre></li> </code></pre><ul>
<li>While I was doing that I found two items using CGSpace URLs instead of handles in their <code>dc.identifier.uri</code> so I corrected those</li>
<li><p>While I was doing that I found two items using CGSpace URLs instead of handles in their <code>dc.identifier.uri</code> so I corrected those</p></li> <li>I also found several items that had invalid characters or multiple Handles in some related URL field like <code>cg.link.reference</code> so I corrected those too</li>
<li>Improve the usage rights on the submission form by adding a default selection with no value as well as a better hint to look for the CC license on the publisher page or in the PDF (<a href="https://github.com/ilri/DSpace/pull/398">#398</a>)</li>
<li><p>I also found several items that had invalid characters or multiple Handles in some related URL field like <code>cg.link.reference</code> so I corrected those too</p></li> <li>I deployed the changes on CGSpace, ran all system updates, and rebooted the server</li>
<li>Also, I updated all Handles in the database to use HTTPS:</li>
<li><p>Improve the usage rights on the submission form by adding a default selection with no value as well as a better hint to look for the CC license on the publisher page or in the PDF (<a href="https://github.com/ilri/DSpace/pull/398">#398</a>)</p></li> </ul>
<li><p>I deployed the changes on CGSpace, ran all system updates, and rebooted the server</p></li>
<li><p>Also, I updated all Handles in the database to use HTTPS:</p>
<pre><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%'; <pre><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
UPDATE 76608 UPDATE 76608
</code></pre></li> </code></pre><ul>
<li>Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem</li>
<li><p>Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem</p></li> <li>Help CGSpace users with some issues related to usage rights</li>
<li><p>Help CGSpace users with some issues related to usage rights</p></li>
</ul> </ul>
<h2 id="20181023">2018-10-23</h2>
<h2 id="2018-10-23">2018-10-23</h2>
<ul> <ul>
<li>Improve the usage rights (dc.rights) on CGSpace again by adding the long names in the submission form, as well as adding versio 3.0 and Creative Commons Zero (CC0) public domain license (<a href="https://github.com/ilri/DSpace/pull/399">#399</a>)</li> <li>Improve the usage rights (dc.rights) on CGSpace again by adding the long names in the submission form, as well as adding versio 3.0 and Creative Commons Zero (CC0) public domain license (<a href="https://github.com/ilri/DSpace/pull/399">#399</a>)</li>
<li>Add &ldquo;usage rights&rdquo; to the XMLUI item display (<a href="https://github.com/ilri/DSpace/pull/400">#400</a>)</li> <li>Add &ldquo;usage rights&rdquo; to the XMLUI item display (<a href="https://github.com/ilri/DSpace/pull/400">#400</a>)</li>
<li>I emailed the MARLO guys to ask if they can send us a dump of rights data and Handles from their system so we can tag our older items on CGSpace</li> <li>I emailed the MARLO guys to ask if they can send us a dump of rights data and Handles from their system so we can tag our older items on CGSpace</li>
<li>Testing REST login and logout via httpie because Felix from Earlham says he's having issues:</li>
<li><p>Testing REST login and logout via httpie because Felix from Earlham says he&rsquo;s having issues:</p> </ul>
<pre><code>$ http --print b POST 'https://dspacetest.cgiar.org/rest/login' email='testdeposit@cgiar.org' password=deposit <pre><code>$ http --print b POST 'https://dspacetest.cgiar.org/rest/login' email='testdeposit@cgiar.org' password=deposit
acef8a4a-41f3-4392-b870-e873790f696b acef8a4a-41f3-4392-b870-e873790f696b
$ http POST 'https://dspacetest.cgiar.org/rest/logout' rest-dspace-token:acef8a4a-41f3-4392-b870-e873790f696b $ http POST 'https://dspacetest.cgiar.org/rest/logout' rest-dspace-token:acef8a4a-41f3-4392-b870-e873790f696b
</code></pre></li> </code></pre><ul>
<li>Also works via curl (login, check status, logout, check status):</li>
<li><p>Also works via curl (login, check status, logout, check status):</p> </ul>
<pre><code>$ curl -H &quot;Content-Type: application/json&quot; --data '{&quot;email&quot;:&quot;testdeposit@cgiar.org&quot;, &quot;password&quot;:&quot;deposit&quot;}' https://dspacetest.cgiar.org/rest/login <pre><code>$ curl -H &quot;Content-Type: application/json&quot; --data '{&quot;email&quot;:&quot;testdeposit@cgiar.org&quot;, &quot;password&quot;:&quot;deposit&quot;}' https://dspacetest.cgiar.org/rest/login
e09fb5e1-72b0-4811-a2e5-5c1cd78293cc e09fb5e1-72b0-4811-a2e5-5c1cd78293cc
$ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: application/json&quot; -H &quot;rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc&quot; https://dspacetest.cgiar.org/rest/status $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: application/json&quot; -H &quot;rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc&quot; https://dspacetest.cgiar.org/rest/status
@ -701,28 +567,21 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
$ curl -X POST -H &quot;Content-Type: application/json&quot; -H &quot;rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc&quot; https://dspacetest.cgiar.org/rest/logout $ curl -X POST -H &quot;Content-Type: application/json&quot; -H &quot;rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc&quot; https://dspacetest.cgiar.org/rest/logout
$ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: application/json&quot; -H &quot;rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc&quot; https://dspacetest.cgiar.org/rest/status $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: application/json&quot; -H &quot;rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc&quot; https://dspacetest.cgiar.org/rest/status
{&quot;okay&quot;:true,&quot;authenticated&quot;:false,&quot;email&quot;:null,&quot;fullname&quot;:null,&quot;token&quot;:null}% {&quot;okay&quot;:true,&quot;authenticated&quot;:false,&quot;email&quot;:null,&quot;fullname&quot;:null,&quot;token&quot;:null}%
</code></pre></li> </code></pre><ul>
<li>Improve the documentatin of my <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a></li>
<li><p>Improve the documentatin of my <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a></p></li> <li>Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners</li>
<li><p>Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners</p></li>
</ul> </ul>
<h2 id="20181024">2018-10-24</h2>
<h2 id="2018-10-24">2018-10-24</h2>
<ul> <ul>
<li>I deployed the new Creative Commons choices to the usage rights on the CGSpace submission form</li> <li>I deployed the new Creative Commons choices to the usage rights on the CGSpace submission form</li>
<li>Also, I deployed the changes to show usage rights on the item view</li> <li>Also, I deployed the changes to show usage rights on the item view</li>
<li>Re-work the <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> to use Python&rsquo;s native json instead of ujson to make it easier to deploy in places where we don&rsquo;t haveor don&rsquo;t want to havePython headers and a compiler (like containers)</li> <li>Re-work the <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> to use Python's native json instead of ujson to make it easier to deploy in places where we don't haveor don't want to havePython headers and a compiler (like containers)</li>
<li>Re-work the deployment of the API to use systemd&rsquo;s <code>EnvironmentFile</code> to read the environment variables instead of <code>Environment</code> in the <a href="https://github.com/ilri/rmg-ansible-public">RMG Ansible infrastructure scripts</a></li> <li>Re-work the deployment of the API to use systemd's <code>EnvironmentFile</code> to read the environment variables instead of <code>Environment</code> in the <a href="https://github.com/ilri/rmg-ansible-public">RMG Ansible infrastructure scripts</a></li>
</ul> </ul>
<h2 id="20181025">2018-10-25</h2>
<h2 id="2018-10-25">2018-10-25</h2>
<ul> <ul>
<li>Send Peter and Jane a list of technical ToRs for AReS open source work:</li> <li>Send Peter and Jane a list of technical ToRs for AReS open source work:</li>
<li>Basic version of AReS that works with metadata fields present in default DSpace 5.x/6.x (for example author, date issued, type, subjects) <li>Basic version of AReS that works with metadata fields present in default DSpace 5.x/6.x (for example author, date issued, type, subjects)
<ul> <ul>
<li>Ability to harvest multiple repositories</li> <li>Ability to harvest multiple repositories</li>
<li>Configurable list of extra fields to harvest, per repository</li> <li>Configurable list of extra fields to harvest, per repository</li>
@ -732,52 +591,44 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
<li>Optional harvesting of Altmetric mentions</li> <li>Optional harvesting of Altmetric mentions</li>
<li>Configurable scheduling of harvesting (daily, weekly, etc)</li> <li>Configurable scheduling of harvesting (daily, weekly, etc)</li>
<li>High-quality README.md on GitHub with description, requirements, deployment instructions, and license (GPLv3 unless ICARDA has a problem with that)</li> <li>High-quality README.md on GitHub with description, requirements, deployment instructions, and license (GPLv3 unless ICARDA has a problem with that)</li>
</ul></li> </ul>
</li>
<li>Maria asked if we can add publisher (<code>dc.publisher</code>) to the advanced search filters, so I created a <a href="https://github.com/ilri/DSpace/issues/401">GitHub issue</a> to track it</li> <li>Maria asked if we can add publisher (<code>dc.publisher</code>) to the advanced search filters, so I created a <a href="https://github.com/ilri/DSpace/issues/401">GitHub issue</a> to track it</li>
</ul> </ul>
<h2 id="20181028">2018-10-28</h2>
<h2 id="2018-10-28">2018-10-28</h2>
<ul> <ul>
<li>I forked the <a href="https://github.com/alanorth/SolrClient/tree/kazoo-2.5.0">SolrClient library and updated its kazoo dependency to be version 2.5.0</a> so we stop getting errors about &ldquo;async&rdquo; being a reserved keyword in Python 3.7</li> <li>I forked the <a href="https://github.com/alanorth/SolrClient/tree/kazoo-2.5.0">SolrClient library and updated its kazoo dependency to be version 2.5.0</a> so we stop getting errors about &ldquo;async&rdquo; being a reserved keyword in Python 3.7</li>
<li>Then I re-generated the <code>requirements.txt</code> in the <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-library</a> and released version 0.5.2</li> <li>Then I re-generated the <code>requirements.txt</code> in the <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-library</a> and released version 0.5.2</li>
<li>Then I re-deployed the API on DSpace Test, ran all system updates on the server, and rebooted it</li> <li>Then I re-deployed the API on DSpace Test, ran all system updates on the server, and rebooted it</li>
<li>I tested my hack of depositing to one collection where the default item and bistream READ policies are restricted and then mapping the item to another collection, but the item retains its default policies so Anonymous cannot see them in the mapped collection either</li> <li>I tested my hack of depositing to one collection where the default item and bistream READ policies are restricted and then mapping the item to another collection, but the item retains its default policies so Anonymous cannot see them in the mapped collection either</li>
<li>Perhaps we need to try moving the item and inheriting the target collection&rsquo;s policies?</li> <li>Perhaps we need to try moving the item and inheriting the target collection's policies?</li>
<li>I merged the changes for adding publisher (<code>dc.publisher</code>) to the advanced search to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/402">#402</a>)</li> <li>I merged the changes for adding publisher (<code>dc.publisher</code>) to the advanced search to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/402">#402</a>)</li>
<li>I merged the changes for adding versionless Creative Commons licenses to the submission form to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/403">#403</a>)</li> <li>I merged the changes for adding versionless Creative Commons licenses to the submission form to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/403">#403</a>)</li>
<li>I will deploy them later this week</li> <li>I will deploy them later this week</li>
</ul> </ul>
<h2 id="20181029">2018-10-29</h2>
<h2 id="2018-10-29">2018-10-29</h2>
<ul> <ul>
<li>I deployed the publisher and Creative Commons changes to CGSpace, ran all system updates, and rebooted the server</li> <li>I deployed the publisher and Creative Commons changes to CGSpace, ran all system updates, and rebooted the server</li>
<li>I sent the email to Jane Poole and ILRI ICT and Finance to start the admin process of getting a new Linode server for AReS</li> <li>I sent the email to Jane Poole and ILRI ICT and Finance to start the admin process of getting a new Linode server for AReS</li>
</ul> </ul>
<h2 id="20181030">2018-10-30</h2>
<h2 id="2018-10-30">2018-10-30</h2>
<ul> <ul>
<li>Meet with the COPO guys to walk them through the CGSpace submission workflow and discuss CG core, REST API, etc <li>Meet with the COPO guys to walk them through the CGSpace submission workflow and discuss CG core, REST API, etc
<ul> <ul>
<li>I suggested that they look into submitting via the <a href="https://wiki.duraspace.org/display/DSDOC5x/SWORDv2+Server">SWORDv2</a> protocol because it respects the workflows</li> <li>I suggested that they look into submitting via the <a href="https://wiki.duraspace.org/display/DSDOC5x/SWORDv2+Server">SWORDv2</a> protocol because it respects the workflows</li>
<li>They said that they&rsquo;re not too worried about the hierarchical CG core schema, that they would just flatten metadata like affiliations when depositing to a DSpace repository</li> <li>They said that they're not too worried about the hierarchical CG core schema, that they would just flatten metadata like affiliations when depositing to a DSpace repository</li>
<li>I said that it might be time to engage the DSpace community to add support for more advanced schemas in DSpace 7+ (perhaps partnership with Atmire?)</li> <li>I said that it might be time to engage the DSpace community to add support for more advanced schemas in DSpace 7+ (perhaps partnership with Atmire?)</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-10-31">2018-10-31</h2> </ul>
<h2 id="20181031">2018-10-31</h2>
<ul> <ul>
<li>More discussion and planning for AReS open sourcing and Amman meeting in 2019-10</li> <li>More discussion and planning for AReS open sourcing and Amman meeting in 2019-10</li>
<li>I did some work to clean up and improve the dspace-statistics-api README.md and project structure and <a href="https://github.com/ilri/dspace-statistics-api">moved it to the ILRI organization on GitHub</a></li> <li>I did some work to clean up and improve the dspace-statistics-api README.md and project structure and <a href="https://github.com/ilri/dspace-statistics-api">moved it to the ILRI organization on GitHub</a></li>
<li>Now the API serves some basic documentation on the root route</li> <li>Now the API serves some basic documentation on the root route</li>
<li>I want to announce it to the dspace-tech mailing list soon</li> <li>I want to announce it to the dspace-tech mailing list soon</li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,14 +8,11 @@
<meta property="og:title" content="November, 2018" /> <meta property="og:title" content="November, 2018" />
<meta property="og:description" content="2018-11-01 <meta property="og:description" content="2018-11-01
Finalize AReS Phase I and Phase II ToRs Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03 2018-11-03
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs: Today these are the top 10 IPs:
" /> " />
@ -28,18 +25,15 @@ Today these are the top 10 IPs:
<meta name="twitter:title" content="November, 2018"/> <meta name="twitter:title" content="November, 2018"/>
<meta name="twitter:description" content="2018-11-01 <meta name="twitter:description" content="2018-11-01
Finalize AReS Phase I and Phase II ToRs Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03 2018-11-03
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs: Today these are the top 10 IPs:
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -120,20 +114,16 @@ Today these are the top 10 IPs:
</p> </p>
</header> </header>
<h2 id="2018-11-01">2018-11-01</h2> <h2 id="20181101">2018-11-01</h2>
<ul> <ul>
<li>Finalize AReS Phase I and Phase II ToRs</li> <li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul> </ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul> <ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63 1300 66.249.64.63
1384 35.237.175.180 1384 35.237.175.180
@ -145,239 +135,195 @@ Today these are the top 10 IPs:
3367 84.38.130.177 3367 84.38.130.177
4537 70.32.83.92 4537 70.32.83.92
22508 66.249.64.59 22508 66.249.64.59
</code></pre> </code></pre><ul>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li> <li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> <li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it's only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
<li><p><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</p> </ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 <pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre></li> </code></pre><ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
<li><p>They at least seem to be re-using their Tomcat sessions:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03
342 342
</code></pre></li> </code></pre><ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><p><code>50.116.102.77</code> is also a regular REST API user</p></li> <li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
<li><p><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</p></li> </ul>
<li><p><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</p>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 <pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre></li> </code></pre><ul>
<li>And it doesn't seem they are re-using their Tomcat sessions:</li>
<li><p>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03
1243 1243
</code></pre></li> </code></pre><ul>
<li>Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li><p>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</p></li> <li>I wonder if it's worth adding them to the list of bots in the nginx config?</li>
<li>Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth</li>
<li><p>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</p></li> <li>Looking at the nginx logs again I see the following top ten IPs:</li>
</ul>
<li><p>Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth</p></li>
<li><p>Looking at the nginx logs again I see the following top ten IPs:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1979 50.116.102.77 1979 50.116.102.77
1980 35.237.175.180 1980 35.237.175.180
2186 207.46.13.156 2186 207.46.13.156
2208 40.77.167.175 2208 40.77.167.175
2843 66.249.64.63 2843 66.249.64.63
4220 84.38.130.177 4220 84.38.130.177
4537 70.32.83.92 4537 70.32.83.92
5593 66.249.64.61 5593 66.249.64.61
12557 78.46.89.18 12557 78.46.89.18
32152 66.249.64.59 32152 66.249.64.59
</code></pre></li> </code></pre><ul>
<li><code>78.46.89.18</code> is new since I last checked a few hours ago, and it's from Hetzner with the following user agent:</li>
<li><p><code>78.46.89.18</code> is new since I last checked a few hours ago, and it&rsquo;s from Hetzner with the following user agent:</p> </ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 <pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre></li> </code></pre><ul>
<li>It's making lots of requests, though actually it does seem to be re-using its Tomcat sessions:</li>
<li><p>It&rsquo;s making lots of requests, though actually it does seem to be re-using its Tomcat sessions:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
8449 8449
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03 | sort | uniq | wc -l
1 1
</code></pre></li> </code></pre><ul>
<li><em>Updated on 2018-12-04 to correct the grep command above, as it was inaccurate and it seems the bot was actually already re-using its Tomcat sessions</em></li>
<li><p><em>Updated on 2018-12-04 to correct the grep command above, as it was inaccurate and it seems the bot was actually already re-using its Tomcat sessions</em></p></li> <li>I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing</li>
<li>Perhaps I should think about adding rate limits to dynamic pages like <code>/discover</code> and <code>/browse</code></li>
<li><p>I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing</p></li> <li>I think it's reasonable for a human to click one of those links five or ten times a minute&hellip;</li>
<li>To contrast, <code>78.46.89.18</code> made about 300 requests per minute for a few hours today:</li>
<li><p>Perhaps I should think about adding rate limits to dynamic pages like <code>/discover</code> and <code>/browse</code></p></li>
<li><p>I think it&rsquo;s reasonable for a human to click one of those links five or ten times a minute&hellip;</p></li>
<li><p>To contrast, <code>78.46.89.18</code> made about 300 requests per minute for a few hours today:</p>
<pre><code># grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
286 03/Nov/2018:18:02
287 03/Nov/2018:18:21
289 03/Nov/2018:18:23
291 03/Nov/2018:18:27
293 03/Nov/2018:18:34
300 03/Nov/2018:17:58
300 03/Nov/2018:18:22
300 03/Nov/2018:18:32
304 03/Nov/2018:18:12
305 03/Nov/2018:18:13
305 03/Nov/2018:18:24
312 03/Nov/2018:18:39
322 03/Nov/2018:18:17
326 03/Nov/2018:18:38
327 03/Nov/2018:18:16
330 03/Nov/2018:17:57
332 03/Nov/2018:18:19
336 03/Nov/2018:17:56
340 03/Nov/2018:18:14
341 03/Nov/2018:18:18
</code></pre></li>
<li><p>If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI</p></li>
<li><p>I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later</p></li>
<li><p>Also, this is the third (?) time a mysterious IP on Hetzner has done this&hellip; who is this?</p></li>
</ul> </ul>
<pre><code># grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
<h2 id="2018-11-04">2018-11-04</h2> 286 03/Nov/2018:18:02
287 03/Nov/2018:18:21
289 03/Nov/2018:18:23
291 03/Nov/2018:18:27
293 03/Nov/2018:18:34
300 03/Nov/2018:17:58
300 03/Nov/2018:18:22
300 03/Nov/2018:18:32
304 03/Nov/2018:18:12
305 03/Nov/2018:18:13
305 03/Nov/2018:18:24
312 03/Nov/2018:18:39
322 03/Nov/2018:18:17
326 03/Nov/2018:18:38
327 03/Nov/2018:18:16
330 03/Nov/2018:17:57
332 03/Nov/2018:18:19
336 03/Nov/2018:17:56
340 03/Nov/2018:18:14
341 03/Nov/2018:18:18
</code></pre><ul>
<li>If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI</li>
<li>I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later</li>
<li>Also, this is the third (?) time a mysterious IP on Hetzner has done this&hellip; who is this?</li>
</ul>
<h2 id="20181104">2018-11-04</h2>
<ul> <ul>
<li>Forward Peter&rsquo;s information about CGSpace financials to Modi from ICRISAT</li> <li>Forward Peter's information about CGSpace financials to Modi from ICRISAT</li>
<li>Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again</li> <li>Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again</li>
<li>Here are the top ten IPs active so far this morning:</li>
<li><p>Here are the top ten IPs active so far this morning:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;04/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;04/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1083 2a03:2880:11ff:2::face:b00c 1083 2a03:2880:11ff:2::face:b00c
1105 2a03:2880:11ff:d::face:b00c 1105 2a03:2880:11ff:d::face:b00c
1111 2a03:2880:11ff:f::face:b00c 1111 2a03:2880:11ff:f::face:b00c
1134 84.38.130.177 1134 84.38.130.177
1893 50.116.102.77 1893 50.116.102.77
2040 66.249.64.63 2040 66.249.64.63
4210 66.249.64.61 4210 66.249.64.61
4534 70.32.83.92 4534 70.32.83.92
13036 78.46.89.18 13036 78.46.89.18
20407 66.249.64.59 20407 66.249.64.59
</code></pre></li> </code></pre><ul>
<li><code>78.46.89.18</code> is back&hellip; and it is still actually re-using its Tomcat sessions:</li>
<li><p><code>78.46.89.18</code> is back&hellip; and it is still actually re-using its Tomcat sessions:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04
8765 8765
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04 | sort | uniq | wc -l
1 1
</code></pre></li> </code></pre><ul>
<li><em>Updated on 2018-12-04 to correct the grep command and point out that the bot was actually re-using its Tomcat sessions properly</em></li>
<li><p><em>Updated on 2018-12-04 to correct the grep command and point out that the bot was actually re-using its Tomcat sessions properly</em></p></li> <li>Also, now we have a ton of Facebook crawlers:</li>
</ul>
<li><p>Also, now we have a ton of Facebook crawlers:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;04/Nov/2018&quot; | grep &quot;2a03:2880:11ff:&quot; | awk '{print $1}' | sort | uniq -c | sort -n <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;04/Nov/2018&quot; | grep &quot;2a03:2880:11ff:&quot; | awk '{print $1}' | sort | uniq -c | sort -n
905 2a03:2880:11ff:b::face:b00c 905 2a03:2880:11ff:b::face:b00c
955 2a03:2880:11ff:5::face:b00c 955 2a03:2880:11ff:5::face:b00c
965 2a03:2880:11ff:e::face:b00c 965 2a03:2880:11ff:e::face:b00c
984 2a03:2880:11ff:8::face:b00c 984 2a03:2880:11ff:8::face:b00c
993 2a03:2880:11ff:3::face:b00c 993 2a03:2880:11ff:3::face:b00c
994 2a03:2880:11ff:7::face:b00c 994 2a03:2880:11ff:7::face:b00c
1006 2a03:2880:11ff:10::face:b00c 1006 2a03:2880:11ff:10::face:b00c
1011 2a03:2880:11ff:4::face:b00c 1011 2a03:2880:11ff:4::face:b00c
1023 2a03:2880:11ff:6::face:b00c 1023 2a03:2880:11ff:6::face:b00c
1026 2a03:2880:11ff:9::face:b00c 1026 2a03:2880:11ff:9::face:b00c
1039 2a03:2880:11ff:1::face:b00c 1039 2a03:2880:11ff:1::face:b00c
1043 2a03:2880:11ff:c::face:b00c 1043 2a03:2880:11ff:c::face:b00c
1070 2a03:2880:11ff::face:b00c 1070 2a03:2880:11ff::face:b00c
1075 2a03:2880:11ff:a::face:b00c 1075 2a03:2880:11ff:a::face:b00c
1093 2a03:2880:11ff:2::face:b00c 1093 2a03:2880:11ff:2::face:b00c
1107 2a03:2880:11ff:d::face:b00c 1107 2a03:2880:11ff:d::face:b00c
1116 2a03:2880:11ff:f::face:b00c 1116 2a03:2880:11ff:f::face:b00c
</code></pre></li> </code></pre><ul>
<li>They are really making shit tons of requests:</li>
<li><p>They are really making shit tons of requests:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04
37721 37721
</code></pre></li> </code></pre><ul>
<li><em>Updated on 2018-12-04 to correct the grep command to accurately show the number of requests</em></li>
<li><p><em>Updated on 2018-12-04 to correct the grep command to accurately show the number of requests</em></p></li> <li>Their user agent is:</li>
</ul>
<li><p>Their user agent is:</p>
<pre><code>facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) <pre><code>facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
</code></pre></li> </code></pre><ul>
<li>I will add it to the Tomcat Crawler Session Manager valve</li>
<li><p>I will add it to the Tomcat Crawler Session Manager valve</p></li> <li>Later in the evening&hellip; ok, this Facebook bot is getting super annoying:</li>
</ul>
<li><p>Later in the evening&hellip; ok, this Facebook bot is getting super annoying:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;04/Nov/2018&quot; | grep &quot;2a03:2880:11ff:&quot; | awk '{print $1}' | sort | uniq -c | sort -n <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;04/Nov/2018&quot; | grep &quot;2a03:2880:11ff:&quot; | awk '{print $1}' | sort | uniq -c | sort -n
1871 2a03:2880:11ff:3::face:b00c 1871 2a03:2880:11ff:3::face:b00c
1885 2a03:2880:11ff:b::face:b00c 1885 2a03:2880:11ff:b::face:b00c
1941 2a03:2880:11ff:8::face:b00c 1941 2a03:2880:11ff:8::face:b00c
1942 2a03:2880:11ff:e::face:b00c 1942 2a03:2880:11ff:e::face:b00c
1987 2a03:2880:11ff:1::face:b00c 1987 2a03:2880:11ff:1::face:b00c
2023 2a03:2880:11ff:2::face:b00c 2023 2a03:2880:11ff:2::face:b00c
2027 2a03:2880:11ff:4::face:b00c 2027 2a03:2880:11ff:4::face:b00c
2032 2a03:2880:11ff:9::face:b00c 2032 2a03:2880:11ff:9::face:b00c
2034 2a03:2880:11ff:10::face:b00c 2034 2a03:2880:11ff:10::face:b00c
2050 2a03:2880:11ff:5::face:b00c 2050 2a03:2880:11ff:5::face:b00c
2061 2a03:2880:11ff:c::face:b00c 2061 2a03:2880:11ff:c::face:b00c
2076 2a03:2880:11ff:6::face:b00c 2076 2a03:2880:11ff:6::face:b00c
2093 2a03:2880:11ff:7::face:b00c 2093 2a03:2880:11ff:7::face:b00c
2107 2a03:2880:11ff::face:b00c 2107 2a03:2880:11ff::face:b00c
2118 2a03:2880:11ff:d::face:b00c 2118 2a03:2880:11ff:d::face:b00c
2164 2a03:2880:11ff:a::face:b00c 2164 2a03:2880:11ff:a::face:b00c
2178 2a03:2880:11ff:f::face:b00c 2178 2a03:2880:11ff:f::face:b00c
</code></pre></li> </code></pre><ul>
<li>Now at least the Tomcat Crawler Session Manager Valve seems to be forcing it to re-use some Tomcat sessions:</li>
<li><p>Now at least the Tomcat Crawler Session Manager Valve seems to be forcing it to re-use some Tomcat sessions:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04
37721 37721
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq | wc -l
15206 15206
</code></pre></li> </code></pre><ul>
<li>I think we still need to limit more of the dynamic pages, like the &ldquo;most popular&rdquo; country, item, and author pages</li>
<li><p>I think we still need to limit more of the dynamic pages, like the &ldquo;most popular&rdquo; country, item, and author pages</p></li> <li>It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!</li>
</ul>
<li><p>It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!</p>
<pre><code># grep 'face:b00c' /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c 'most-popular/' <pre><code># grep 'face:b00c' /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c 'most-popular/'
7033 7033
</code></pre></li> </code></pre><ul>
<li>I added the &ldquo;most-popular&rdquo; pages to the list that return <code>X-Robots-Tag: none</code> to try to inform bots not to index or follow those pages</li>
<li><p>I added the &ldquo;most-popular&rdquo; pages to the list that return <code>X-Robots-Tag: none</code> to try to inform bots not to index or follow those pages</p></li> <li>Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages&hellip; I figure a human user might legitimately request one every five seconds</li>
<li><p>Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages&hellip; I figure a human user might legitimately request one every five seconds</p></li>
</ul> </ul>
<h2 id="20181105">2018-11-05</h2>
<h2 id="2018-11-05">2018-11-05</h2>
<ul> <ul>
<li><p>I wrote a small Python script <a href="https://gist.github.com/alanorth/4ff81d5f65613814a66cb6f84fdf1fc5">add-dc-rights.py</a> to add usage rights (<code>dc.rights</code>) to CGSpace items based on the CSV Hector gave me from MARLO:</p> <li>I wrote a small Python script <a href="https://gist.github.com/alanorth/4ff81d5f65613814a66cb6f84fdf1fc5">add-dc-rights.py</a> to add usage rights (<code>dc.rights</code>) to CGSpace items based on the CSV Hector gave me from MARLO:</li>
</ul>
<pre><code>$ ./add-dc-rights.py -i /tmp/marlo.csv -db dspace -u dspace -p 'fuuu' <pre><code>$ ./add-dc-rights.py -i /tmp/marlo.csv -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>The file <code>marlo.csv</code> was cleaned up and formatted in Open Refine</li>
<li><p>The file <code>marlo.csv</code> was cleaned up and formatted in Open Refine</p></li> <li>165 of the items in their 2017 data are from CGSpace!</li>
<li>I will add the data to CGSpace this week (done!)</li>
<li><p>165 of the items in their 2017 data are from CGSpace!</p></li> <li>Jesus, is Facebook <em>trying</em> to be annoying? At least the Tomcat Crawler Session Manager Valve is working to force the bot to re-use its Tomcat sessions:</li>
</ul>
<li><p>I will add the data to CGSpace this week (done!)</p></li>
<li><p>Jesus, is Facebook <em>trying</em> to be annoying? At least the Tomcat Crawler Session Manager Valve is working to force the bot to re-use its Tomcat sessions:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Nov/2018&quot; | grep -c &quot;2a03:2880:11ff:&quot; <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Nov/2018&quot; | grep -c &quot;2a03:2880:11ff:&quot;
29889 29889
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-05 # grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-05
@ -386,253 +332,199 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
1057 1057
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Nov/2018&quot; | grep &quot;2a03:2880:11ff:&quot; | grep -c -E &quot;(handle|bitstream)&quot; # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Nov/2018&quot; | grep &quot;2a03:2880:11ff:&quot; | grep -c -E &quot;(handle|bitstream)&quot;
29896 29896
</code></pre></li> </code></pre><ul>
<li>29,000 requests from Facebook and none of the requests are to the dynamic pages I rate limited yesterday!</li>
<li><p>29,000 requests from Facebook and none of the requests are to the dynamic pages I rate limited yesterday!</p></li> <li>At least the Tomcat Crawler Session Manager Valve is working now&hellip;</li>
<li><p>At least the Tomcat Crawler Session Manager Valve is working now&hellip;</p></li>
</ul> </ul>
<h2 id="20181106">2018-11-06</h2>
<h2 id="2018-11-06">2018-11-06</h2>
<ul> <ul>
<li>I updated all the <a href="https://github.com/ilri/DSpace/wiki/Scripts">DSpace helper Python scripts</a> to validate against PEP 8 using Flake8</li> <li>I updated all the <a href="https://github.com/ilri/DSpace/wiki/Scripts">DSpace helper Python scripts</a> to validate against PEP 8 using Flake8</li>
<li>While I was updating the <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a> script I noticed it was using <code>expand=all</code> to get the collection and community IDs</li> <li>While I was updating the <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a> script I noticed it was using <code>expand=all</code> to get the collection and community IDs</li>
<li>I realized I actually only need <code>expand=collections,subCommunities</code>, and I wanted to see how much overhead the extra expands created so I did three runs of each:</li>
<li><p>I realized I actually only need <code>expand=collections,subCommunities</code>, and I wanted to see how much overhead the extra expands created so I did three runs of each:</p>
<pre><code>$ time ./rest-find-collections.py 10568/27629 --rest-url https://dspacetest.cgiar.org/rest
</code></pre></li>
<li><p>Average time with all expands was 14.3 seconds, and 12.8 seconds with <code>collections,subCommunities</code>, so <strong>1.5 seconds difference</strong>!</p></li>
</ul> </ul>
<pre><code>$ time ./rest-find-collections.py 10568/27629 --rest-url https://dspacetest.cgiar.org/rest
<h2 id="2018-11-07">2018-11-07</h2> </code></pre><ul>
<li>Average time with all expands was 14.3 seconds, and 12.8 seconds with <code>collections,subCommunities</code>, so <strong>1.5 seconds difference</strong>!</li>
</ul>
<h2 id="20181107">2018-11-07</h2>
<ul> <ul>
<li>Update my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to use a database management class with Python contexts so that connections and cursors are automatically opened and closed</li> <li>Update my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to use a database management class with Python contexts so that connections and cursors are automatically opened and closed</li>
<li>Tag version 0.7.0 of the dspace-statistics-api</li> <li>Tag version 0.7.0 of the dspace-statistics-api</li>
</ul> </ul>
<h2 id="20181108">2018-11-08</h2>
<h2 id="2018-11-08">2018-11-08</h2>
<ul> <ul>
<li>I deployed verison 0.7.0 of the dspace-statistics-api on DSpace Test (linode19) so I can test it for a few days (and check the Munin stats to see the change in database connections) before deploying on CGSpace</li> <li>I deployed verison 0.7.0 of the dspace-statistics-api on DSpace Test (linode19) so I can test it for a few days (and check the Munin stats to see the change in database connections) before deploying on CGSpace</li>
<li>I also enabled systemd&rsquo;s persistent journal by setting <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html"><code>Storage=persistent</code> in <em>journald.conf</em></a></li> <li>I also enabled systemd's persistent journal by setting <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html"><code>Storage=persistent</code> in <em>journald.conf</em></a></li>
<li>Apparently <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html">Ubuntu 16.04 defaulted to using rsyslog for boot records until early 2018</a>, so I removed <code>rsyslog</code> too</li> <li>Apparently <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html">Ubuntu 16.04 defaulted to using rsyslog for boot records until early 2018</a>, so I removed <code>rsyslog</code> too</li>
<li>Proof 277 IITA records on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/107871">IITA_ ALIZZY1802-csv_oct23</a> <li>Proof 277 IITA records on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/107871">IITA_ ALIZZY1802-csv_oct23</a>
<ul> <ul>
<li>There were a few issues with countries, a few language erorrs, a few whitespace errors, and then a handful of ISSNs in the ISBN field</li> <li>There were a few issues with countries, a few language erorrs, a few whitespace errors, and then a handful of ISSNs in the ISBN field</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-11-11">2018-11-11</h2> </ul>
<h2 id="20181111">2018-11-11</h2>
<ul> <ul>
<li>I added tests to the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>!</li> <li>I added tests to the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>!</li>
<li>It runs with Python 3.5, 3.6, and 3.7 using pytest, including automatically on Travis CI!</li> <li>It runs with Python 3.5, 3.6, and 3.7 using pytest, including automatically on Travis CI!</li>
</ul> </ul>
<h2 id="20181113">2018-11-13</h2>
<h2 id="2018-11-13">2018-11-13</h2>
<ul> <ul>
<li>Help troubleshoot an issue with Judy Kimani submitting to the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection on CGSpace</li> <li>Help troubleshoot an issue with Judy Kimani submitting to the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection on CGSpace</li>
<li>For some reason there is an existing group for the &ldquo;Accept/Reject&rdquo; workflow step, but it&rsquo;s empty</li> <li>For some reason there is an existing group for the &ldquo;Accept/Reject&rdquo; workflow step, but it's empty</li>
<li>I added Judy to the group and told her to try again</li> <li>I added Judy to the group and told her to try again</li>
<li>Sisay changed his leave to be full days until December so I need to finish the IITA records that he was working on (<a href="https://dspacetest.cgiar.org/handle/10568/107871">IITA_ ALIZZY1802-csv_oct23</a>)</li> <li>Sisay changed his leave to be full days until December so I need to finish the IITA records that he was working on (<a href="https://dspacetest.cgiar.org/handle/10568/107871">IITA_ ALIZZY1802-csv_oct23</a>)</li>
<li>Sisay had said there were a few PDFs missing and Bosede sent them this week, so I had to find those items on DSpace Test and add the bitstreams to the items manually</li> <li>Sisay had said there were a few PDFs missing and Bosede sent them this week, so I had to find those items on DSpace Test and add the bitstreams to the items manually</li>
<li>As for the collection mappings I think I need to export the CSV from DSpace Test, add mappings for each type (ie Books go to IITA books collection, etc), then re-import to DSpace Test, then export from DSpace command line in &ldquo;migrate&rdquo; mode&hellip;</li> <li>As for the collection mappings I think I need to export the CSV from DSpace Test, add mappings for each type (ie Books go to IITA books collection, etc), then re-import to DSpace Test, then export from DSpace command line in &ldquo;migrate&rdquo; mode&hellip;</li>
<li>From there I should be able to script the removal of the old DSpace Test collection so they just go to the correct IITA collections on import into CGSpace</li> <li>From there I should be able to script the removal of the old DSpace Test collection so they just go to the correct IITA collections on import into CGSpace</li>
</ul> </ul>
<h2 id="20181114">2018-11-14</h2>
<h2 id="2018-11-14">2018-11-14</h2>
<ul> <ul>
<li>Finally import the 277 IITA (ALIZZY1802) records to CGSpace</li> <li>Finally import the 277 IITA (ALIZZY1802) records to CGSpace</li>
<li>I had to export them from DSpace Test and import them into a temporary collection on CGSpace first, then export the collection as CSV to map them to new owning collections (IITA books, IITA posters, etc) with OpenRefine because DSpace&rsquo;s <code>dspace export</code> command doesn&rsquo;t include the collections for the items!</li> <li>I had to export them from DSpace Test and import them into a temporary collection on CGSpace first, then export the collection as CSV to map them to new owning collections (IITA books, IITA posters, etc) with OpenRefine because DSpace's <code>dspace export</code> command doesn't include the collections for the items!</li>
<li>Delete all old IITA collections on DSpace Test and run <code>dspace cleanup</code> to get rid of all the bitstreams</li> <li>Delete all old IITA collections on DSpace Test and run <code>dspace cleanup</code> to get rid of all the bitstreams</li>
</ul> </ul>
<h2 id="20181115">2018-11-15</h2>
<h2 id="2018-11-15">2018-11-15</h2>
<ul> <ul>
<li>Deploy version 0.8.1 of the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to CGSpace (linode18)</li> <li>Deploy version 0.8.1 of the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to CGSpace (linode18)</li>
</ul> </ul>
<h2 id="20181118">2018-11-18</h2>
<h2 id="2018-11-18">2018-11-18</h2>
<ul> <ul>
<li>Request invoice from Wild Jordan for their meeting venue in January</li> <li>Request invoice from Wild Jordan for their meeting venue in January</li>
</ul> </ul>
<h2 id="20181119">2018-11-19</h2>
<h2 id="2018-11-19">2018-11-19</h2>
<ul> <ul>
<li><p>Testing corrections and deletions for AGROVOC (<code>dc.subject</code>) that Sisay and Peter were working on earlier this month:</p> <li>Testing corrections and deletions for AGROVOC (<code>dc.subject</code>) that Sisay and Peter were working on earlier this month:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i 2018-11-19-correct-agrovoc.csv -f dc.subject -t correct -m 57 -db dspace -u dspace -p 'fuu' -d <pre><code>$ ./fix-metadata-values.py -i 2018-11-19-correct-agrovoc.csv -f dc.subject -t correct -m 57 -db dspace -u dspace -p 'fuu' -d
$ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m 57 -db dspace -u dspace -p 'fuu' -d $ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m 57 -db dspace -u dspace -p 'fuu' -d
</code></pre></li> </code></pre><ul>
<li>Then I ran them on both CGSpace and DSpace Test, and started a full Discovery re-index on CGSpace:</li>
<li><p>Then I ran them on both CGSpace and DSpace Test, and started a full Discovery re-index on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
</code></pre></li>
<li><p>Generate a new list of the top 1500 AGROVOC subjects on CGSpace to send to Peter and Sisay:</p>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
</code></pre></li>
</ul> </ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
<h2 id="2018-11-20">2018-11-20</h2> </code></pre><ul>
<li>Generate a new list of the top 1500 AGROVOC subjects on CGSpace to send to Peter and Sisay:</li>
</ul>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
</code></pre><h2 id="20181120">2018-11-20</h2>
<ul> <ul>
<li>The Discovery re-indexing on CGSpace never finished yesterday&hellip; the command died after six minutes</li> <li>The Discovery re-indexing on CGSpace never finished yesterday&hellip; the command died after six minutes</li>
<li>The <code>dspace.log.2018-11-19</code> shows this at the time:</li>
<li><p>The <code>dspace.log.2018-11-19</code> shows this at the time:</p> </ul>
<pre><code>2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null <pre><code>2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
java.lang.IllegalStateException: DSpace kernel cannot be null java.lang.IllegalStateException: DSpace kernel cannot be null
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63) at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87) at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102) at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815) at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884) at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370) at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
at org.dspace.discovery.IndexClient.main(IndexClient.java:117) at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731 2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
</code></pre></li> </code></pre><ul>
<li>I looked in the Solr log around that time and I don't see anything&hellip;</li>
<li><p>I looked in the Solr log around that time and I don&rsquo;t see anything&hellip;</p></li> <li>Working on Udana's WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a>
<li><p>Working on Udana&rsquo;s WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a></p>
<ul> <ul>
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li> <li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li>
<li>a few items missing DOIs, but they are easily available on the publication page</li> <li>a few items missing DOIs, but they are easily available on the publication page</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org&quot;">https://doi.org&quot;</a> format</li> <li>clean up DOIs to use &ldquo;<a href="https://doi.org">https://doi.org</a>&rdquo; format</li>
<li>clean up some cg.identifier.url to remove unneccessary query strings</li> <li>clean up some cg.identifier.url to remove unneccessary query strings</li>
<li>remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)</li> <li>remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li> <li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
<li>trim and collapse whitespace in all fields</li> <li>trim and collapse whitespace in all fields</li>
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li> <li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
<li>add dc.rights to some fields that I noticed while checking DOIs</li> <li>add dc.rights to some fields that I noticed while checking DOIs</li>
</ul></li> </ul>
</li>
<li><p>Then the 24 records in <a href="https://dspacetest.cgiar.org/handle/10568/108271">2018-11-20 VRC Temp</a></p> <li>Then the 24 records in <a href="https://dspacetest.cgiar.org/handle/10568/108271">2018-11-20 VRC Temp</a>
<ul> <ul>
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li> <li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li>
<li>trim and collapse whitespace in all fields (lots in WLE subject!)</li> <li>trim and collapse whitespace in all fields (lots in WLE subject!)</li>
<li>clean up some cg.identifier.url fields that had unneccessary anchors in their links</li> <li>clean up some cg.identifier.url fields that had unneccessary anchors in their links</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org&quot;">https://doi.org&quot;</a> format</li> <li>clean up DOIs to use &ldquo;<a href="https://doi.org">https://doi.org</a>&rdquo; format</li>
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li> <li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
<li>remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)</li> <li>remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li> <li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
<li>I notice a few items using DOIs pointing at ICARDA&rsquo;s DSpace like: <a href="https://doi.org/20.500.11766/8178">https://doi.org/20.500.11766/8178</a>, which then points at the &ldquo;real&rdquo; DOI on the publisher&rsquo;s site&hellip; these should be using the real DOI instead of ICARDA&rsquo;s &ldquo;fake&rdquo; Handle DOI</li> <li>I notice a few items using DOIs pointing at ICARDA's DSpace like: <a href="https://doi.org/20.500.11766/8178">https://doi.org/20.500.11766/8178</a>, which then points at the &ldquo;real&rdquo; DOI on the publisher's site&hellip; these should be using the real DOI instead of ICARDA's &ldquo;fake&rdquo; Handle DOI</li>
<li>Some items missing DOIs, but they clearly have them if you look at the publisher&rsquo;s site</li> <li>Some items missing DOIs, but they clearly have them if you look at the publisher's site</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-11-22">2018-11-22</h2> </ul>
<h2 id="20181122">2018-11-22</h2>
<ul> <ul>
<li>Tezira is having problems submitting to the <a href="https://cgspace.cgiar.org/handle/10568/24452">ILRI brochures</a> collection for some reason <li>Tezira is having problems submitting to the <a href="https://cgspace.cgiar.org/handle/10568/24452">ILRI brochures</a> collection for some reason
<ul> <ul>
<li>Judy Kimani was having issues resuming submissions in another ILRI collection recently, and the issue there was due to an empty group defined for the &ldquo;accept/reject&rdquo; step (aka workflow step 1)</li> <li>Judy Kimani was having issues resuming submissions in another ILRI collection recently, and the issue there was due to an empty group defined for the &ldquo;accept/reject&rdquo; step (aka workflow step 1)</li>
<li>The error then was &ldquo;authorization denied for workflow step 1&rdquo; where &ldquo;workflow step 1&rdquo; was the &ldquo;accept/reject&rdquo; step, which had a group defined, but was empty</li> <li>The error then was &ldquo;authorization denied for workflow step 1&rdquo; where &ldquo;workflow step 1&rdquo; was the &ldquo;accept/reject&rdquo; step, which had a group defined, but was empty</li>
<li>Adding her to this group solved her issues</li> <li>Adding her to this group solved her issues</li>
<li>Tezira says she&rsquo;s also getting the same &ldquo;authorization denied&rdquo; error for workflow step 1 when resuming submissions, so I told Abenet to delete the empty group</li> <li>Tezira says she's also getting the same &ldquo;authorization denied&rdquo; error for workflow step 1 when resuming submissions, so I told Abenet to delete the empty group</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-11-26">2018-11-26</h2> </ul>
<h2 id="20181126">2018-11-26</h2>
<ul> <ul>
<li><a href="https://cgspace.cgiar.org/handle/10568/97709">This WLE item</a> is issued on 2018-10 and accessioned on 2018-10-22 but does not show up in the <a href="https://cgspace.cgiar.org/handle/10568/41888">WLE R4D Learning Series</a> collection on CGSpace for some reason, and therefore does not show up on the WLE publication website</li> <li><a href="https://cgspace.cgiar.org/handle/10568/97709">This WLE item</a> is issued on 2018-10 and accessioned on 2018-10-22 but does not show up in the <a href="https://cgspace.cgiar.org/handle/10568/41888">WLE R4D Learning Series</a> collection on CGSpace for some reason, and therefore does not show up on the WLE publication website</li>
<li>I tried to remove that collection from Discovery and do a simple re-index:</li>
<li><p>I tried to remove that collection from Discovery and do a simple re-index:</p> </ul>
<pre><code>$ dspace index-discovery -r 10568/41888 <pre><code>$ dspace index-discovery -r 10568/41888
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
</code></pre></li> </code></pre><ul>
<li>&hellip; but the item still doesn't appear in the collection</li>
<li><p>&hellip; but the item still doesn&rsquo;t appear in the collection</p></li> <li>Now I will try a full Discovery re-index:</li>
<li><p>Now I will try a full Discovery re-index:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
</code></pre></li>
<li><p>Ah, Marianne had set the item as private when she uploaded it, so it was still private</p></li>
<li><p>I made it public and now it shows up in the collection list</p></li>
<li><p>More work on the AReS terms of reference for CodeObia</p></li>
<li><p>Erica from AgriKnowledge emailed me to say that they have implemented the changes in their item page UI so that they include the permanent identifier on items harvested from CGSpace, for example: <a href="https://www.agriknowledge.org/concern/generics/wd375w33s">https://www.agriknowledge.org/concern/generics/wd375w33s</a></p></li>
</ul> </ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
<h2 id="2018-11-27">2018-11-27</h2> </code></pre><ul>
<li>Ah, Marianne had set the item as private when she uploaded it, so it was still private</li>
<li>I made it public and now it shows up in the collection list</li>
<li>More work on the AReS terms of reference for CodeObia</li>
<li>Erica from AgriKnowledge emailed me to say that they have implemented the changes in their item page UI so that they include the permanent identifier on items harvested from CGSpace, for example: <a href="https://www.agriknowledge.org/concern/generics/wd375w33s">https://www.agriknowledge.org/concern/generics/wd375w33s</a></li>
</ul>
<h2 id="20181127">2018-11-27</h2>
<ul> <ul>
<li>Linode alerted me that the outbound traffic rate on CGSpace (linode19) was very high</li> <li>Linode alerted me that the outbound traffic rate on CGSpace (linode19) was very high</li>
<li>The top users this morning are:</li>
<li><p>The top users this morning are:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;27/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;27/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
229 46.101.86.248 229 46.101.86.248
261 66.249.64.61 261 66.249.64.61
447 66.249.64.59 447 66.249.64.59
541 207.46.13.77 541 207.46.13.77
548 40.77.167.97 548 40.77.167.97
564 35.237.175.180 564 35.237.175.180
595 40.77.167.135 595 40.77.167.135
611 157.55.39.91 611 157.55.39.91
4564 205.186.128.185 4564 205.186.128.185
4564 70.32.83.92 4564 70.32.83.92
</code></pre></li> </code></pre><ul>
<li>We know 70.32.83.92 is CCAFS harvester on MediaTemple, but 205.186.128.185 is new appears to be a new CCAFS harvester</li>
<li><p>We know 70.32.83.92 is CCAFS harvester on MediaTemple, but 205.186.128.185 is new appears to be a new CCAFS harvester</p></li> <li>I think we might want to prune some old accounts from CGSpace, perhaps users who haven't logged in in the last two years would be a conservative bunch:</li>
</ul>
<li><p>I think we might want to prune some old accounts from CGSpace, perhaps users who haven&rsquo;t logged in in the last two years would be a conservative bunch:</p>
<pre><code>$ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 | wc -l <pre><code>$ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 | wc -l
409 409
$ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
</code></pre></li> </code></pre><ul>
<li>This deleted about 380 users, skipping those who have submissions in the repository</li>
<li><p>This deleted about 380 users, skipping those who have submissions in the repository</p></li> <li>Judy Kimani was having problems taking tasks in the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection again
<li><p>Judy Kimani was having problems taking tasks in the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection again</p>
<ul> <ul>
<li>The workflow step 1 (accept/reject) is now undefined for some reason</li> <li>The workflow step 1 (accept/reject) is now undefined for some reason</li>
<li>Last week the group was defined, but empty, so we added her to the group and she was able to take the tasks</li> <li>Last week the group was defined, but empty, so we added her to the group and she was able to take the tasks</li>
<li>Since then it looks like the group was deleted, so now she didn&rsquo;t have permission to take or leave the tasks in her pool</li> <li>Since then it looks like the group was deleted, so now she didn't have permission to take or leave the tasks in her pool</li>
<li>We added her back to the group, then she was able to take the tasks, and then we removed the group again, as we generally don&rsquo;t use this step in CGSpace</li> <li>We added her back to the group, then she was able to take the tasks, and then we removed the group again, as we generally don't use this step in CGSpace</li>
</ul></li>
<li><p>Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website</p></li>
</ul> </ul>
</li>
<h2 id="2018-11-28">2018-11-28</h2> <li>Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website</li>
</ul>
<h2 id="20181128">2018-11-28</h2>
<ul> <ul>
<li>Change the usage rights text a bit based on Maria Garruccio&rsquo;s feedback on &ldquo;all rights reserved&rdquo; (<a href="https://github.com/ilri/DSpace/pull/404">#404</a>)</li> <li>Change the usage rights text a bit based on Maria Garruccio's feedback on &ldquo;all rights reserved&rdquo; (<a href="https://github.com/ilri/DSpace/pull/404">#404</a>)</li>
<li>Run all system updates on DSpace Test (linode19) and reboot the server</li> <li>Run all system updates on DSpace Test (linode19) and reboot the server</li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,15 +8,12 @@
<meta property="og:title" content="December, 2018" /> <meta property="og:title" content="December, 2018" />
<meta property="og:description" content="2018-12-01 <meta property="og:description" content="2018-12-01
Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
Then I ran all system updates and restarted the server Then I ran all system updates and restarted the server
2018-12-02 2018-12-02
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -28,18 +25,15 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
<meta name="twitter:title" content="December, 2018"/> <meta name="twitter:title" content="December, 2018"/>
<meta name="twitter:description" content="2018-12-01 <meta name="twitter:description" content="2018-12-01
Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
Then I ran all system updates and restarted the server Then I ran all system updates and restarted the server
2018-12-02 2018-12-02
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -120,61 +114,51 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
</p> </p>
</header> </header>
<h2 id="2018-12-01">2018-12-01</h2> <h2 id="20181201">2018-12-01</h2>
<ul> <ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li> <li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li> <li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li> <li>Then I ran all system updates and restarted the server</li>
</ul> </ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul> <ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li> <li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul> </ul>
<ul> <ul>
<li><p>The error when I try to manually run the media filter for one item from the command line:</p> <li>The error when I try to manually run the media filter for one item from the command line:</li>
</ul>
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461. <pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461. org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.Info.getBaseInfo(Info.java:360) at org.im4java.core.Info.getBaseInfo(Info.java:360)
at org.im4java.core.Info.&lt;init&gt;(Info.java:151) at org.im4java.core.Info.&lt;init&gt;(Info.java:151)
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142) at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24) at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170) at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
</code></pre></li> </code></pre><ul>
<li>A comment on <a href="https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf">StackOverflow question</a> from yesterday suggests it might be a bug with the <code>pngalpha</code> device in Ghostscript and <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">links to an upstream bug</a></li>
<li><p>A comment on <a href="https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf">StackOverflow question</a> from yesterday suggests it might be a bug with the <code>pngalpha</code> device in Ghostscript and <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">links to an upstream bug</a></p></li> <li>I think we need to wait for a fix from Ubuntu</li>
<li>For what it's worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</li>
<li><p>I think we need to wait for a fix from Ubuntu</p></li> </ul>
<li><p>For what it&rsquo;s worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</p>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf <pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match DEBUG: FC_WEIGHT didn't match
zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000
</code></pre></li> </code></pre><ul>
<li>When I replace the <code>pngalpha</code> device with <code>png16m</code> as suggested in the StackOverflow comments it works:</li>
<li><p>When I replace the <code>pngalpha</code> device with <code>png16m</code> as suggested in the StackOverflow comments it works:</p> </ul>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf <pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match DEBUG: FC_WEIGHT didn't match
</code></pre></li> </code></pre><ul>
<li>Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend (<a href="https://dspacetest.cgiar.org/handle/10568/108298">IITA_Dec_1_1997 aka Daniel1807</a>)
<li><p>Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend (<a href="https://dspacetest.cgiar.org/handle/10568/108298">IITA_Dec_1_1997 aka Daniel1807</a>)</p>
<ul> <ul>
<li>One item missing the authorship type</li> <li>One item missing the authorship type</li>
<li>Some invalid countries (smart quotes, mispellings)</li> <li>Some invalid countries (smart quotes, mispellings)</li>
@ -182,60 +166,50 @@ DEBUG: FC_WEIGHT didn't match
<li>One item had &ldquo;MADAGASCAR&rdquo; for ISI Journal</li> <li>One item had &ldquo;MADAGASCAR&rdquo; for ISI Journal</li>
<li>Minor corrections in IITA subject (LIVELIHOOD→LIVELIHOODS)</li> <li>Minor corrections in IITA subject (LIVELIHOOD→LIVELIHOODS)</li>
<li>Trim whitespace in abstract field</li> <li>Trim whitespace in abstract field</li>
<li>Fix some sponsors (though some with &ldquo;Governments of Canada&rdquo; etc I&rsquo;m not sure why those are plural)</li> <li>Fix some sponsors (though some with &ldquo;Governments of Canada&rdquo; etc I'm not sure why those are plural)</li>
<li>Eighteen items had <code>en||fr</code> for the language, but the content was only in French so changed them to just <code>fr</code></li> <li>Eighteen items had <code>en||fr</code> for the language, but the content was only in French so changed them to just <code>fr</code></li>
<li>Six items had encoding errors in French text so I will ask Bosede to re-do them carefully</li> <li>Six items had encoding errors in French text so I will ask Bosede to re-do them carefully</li>
<li>Correct and normalize a few AGROVOC subjects</li> <li>Correct and normalize a few AGROVOC subjects</li>
</ul></li>
<li><p>Expand my &ldquo;encoding error&rdquo; detection GREL to include <code>~</code> as I saw a lot of that in some copy pasted French text recently:</p>
<pre><code>or(
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/)),
isNotNull(value.match(/.*\u007e.*/))
)
</code></pre></li>
</ul> </ul>
</li>
<h2 id="2018-12-03">2018-12-03</h2> <li>Expand my &ldquo;encoding error&rdquo; detection GREL to include <code>~</code> as I saw a lot of that in some copy pasted French text recently:</li>
</ul>
<pre><code>or(
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/)),
isNotNull(value.match(/.*\u007e.*/))
)
</code></pre><h2 id="20181203">2018-12-03</h2>
<ul> <ul>
<li>I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs&hellip;</li> <li>I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs&hellip;</li>
<li>I can successfully generate a thumbnail for another recent item (<a href="https://hdl.handle.net/10568/98394"><sup>10568</sup>&frasl;<sub>98394</sub></a>), but not for <a href="https://hdl.handle.net/10568/98390"><sup>10568</sup>&frasl;<sub>98930</sub></a></li> <li>I can successfully generate a thumbnail for another recent item (<a href="https://hdl.handle.net/10568/98394">10568/98394</a>), but not for <a href="https://hdl.handle.net/10568/98390">10568/98930</a></li>
<li>Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the <code>pngalpha</code> device, I can generate a thumbnail for the first one (10568/98394):</li>
<li><p>Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the <code>pngalpha</code> device, I can generate a thumbnail for the first one (<sup>10568</sup>&frasl;<sub>98394</sub>):</p> </ul>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf <pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf
</code></pre></li> </code></pre><ul>
<li>So it seems to be something about the PDFs themselves, perhaps related to alpha support?</li>
<li><p>So it seems to be something about the PDFs themselves, perhaps related to alpha support?</p></li> <li>The first item (10568/98394) has the following information:</li>
</ul>
<li><p>The first item (<sup>10568</sup>&frasl;<sub>98394</sub>) has the following information:</p>
<pre><code>$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\] <pre><code>$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\]
Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=&gt;Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000 Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=&gt;Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746. identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
</code></pre></li> </code></pre><ul>
<li>And wow, I can't even run ImageMagick's <code>identify</code> on the first page of the second item (10568/98930):</li>
<li><p>And wow, I can&rsquo;t even run ImageMagick&rsquo;s <code>identify</code> on the first page of the second item (<sup>10568</sup>&frasl;<sub>98930</sub>):</p> </ul>
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\] <pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\] zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
</code></pre></li> </code></pre><ul>
<li>But with GraphicsMagick's <code>identify</code> it works:</li>
<li><p>But with GraphicsMagick&rsquo;s <code>identify</code> it works:</p> </ul>
<pre><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\] <pre><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
DEBUG: FC_WEIGHT didn't match DEBUG: FC_WEIGHT didn't match
Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s
</code></pre></li> </code></pre><ul>
<li>Interesting that ImageMagick's <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</li>
<li><p>Interesting that ImageMagick&rsquo;s <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</p> </ul>
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf <pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf
Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
@ -243,311 +217,258 @@ Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010
Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746. identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
</code></pre></li> </code></pre><ul>
<li>As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap):</li>
<li><p>As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap):</p> </ul>
<pre><code>$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg <pre><code>$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten
$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg $ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
DEBUG: FC_WEIGHT didn't match DEBUG: FC_WEIGHT didn't match
</code></pre></li> </code></pre><ul>
<li>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn't list a profile, though I don't think this is relevant</li>
<li><p>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn&rsquo;t list a profile, though I don&rsquo;t think this is relevant</p></li> <li>I found another item that fails when generating a thumbnail (<a href="https://hdl.handle.net/10568/98391">10568/98391</a>, DSpace complains:</li>
</ul>
<li><p>I found another item that fails when generating a thumbnail (<a href="https://hdl.handle.net/10568/98391"><sup>10568</sup>&frasl;<sub>98391</sub></a>, DSpace complains:</p>
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461. <pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461. org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.Info.getBaseInfo(Info.java:360) at org.im4java.core.Info.getBaseInfo(Info.java:360)
at org.im4java.core.Info.&lt;init&gt;(Info.java:151) at org.im4java.core.Info.&lt;init&gt;(Info.java:151)
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142) at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24) at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170) at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
Caused by: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461. Caused by: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.ImageCommand.run(ImageCommand.java:219) at org.im4java.core.ImageCommand.run(ImageCommand.java:219)
at org.im4java.core.Info.getBaseInfo(Info.java:342) at org.im4java.core.Info.getBaseInfo(Info.java:342)
... 14 more ... 14 more
Caused by: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461. Caused by: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.ImageCommand.finished(ImageCommand.java:253) at org.im4java.core.ImageCommand.finished(ImageCommand.java:253)
at org.im4java.process.ProcessStarter.run(ProcessStarter.java:314) at org.im4java.process.ProcessStarter.run(ProcessStarter.java:314)
at org.im4java.core.ImageCommand.run(ImageCommand.java:215) at org.im4java.core.ImageCommand.run(ImageCommand.java:215)
... 15 more ... 15 more
</code></pre></li> </code></pre><ul>
<li>And on my Arch Linux environment ImageMagick's <code>convert</code> also segfaults:</li>
<li><p>And on my Arch Linux environment ImageMagick&rsquo;s <code>convert</code> also segfaults:</p> </ul>
<pre><code>$ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg <pre><code>$ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
zsh: abort (core dumped) convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] x60 zsh: abort (core dumped) convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] x60
</code></pre></li> </code></pre><ul>
<li>But GraphicsMagick's <code>convert</code> works:</li>
<li><p>But GraphicsMagick&rsquo;s <code>convert</code> works:</p> </ul>
<pre><code>$ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg <pre><code>$ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
</code></pre></li> </code></pre><ul>
<li>So far the only thing that stands out is that the two files that don't work were created with Microsoft Office 2016:</li>
<li><p>So far the only thing that stands out is that the two files that don&rsquo;t work were created with Microsoft Office 2016:</p> </ul>
<pre><code>$ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)' <pre><code>$ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)'
Creator: Microsoft® Word 2016 Creator: Microsoft® Word 2016
Producer: Microsoft® Word 2016 Producer: Microsoft® Word 2016
$ pdfinfo Food\ safety\ Kenya\ fruits.pdf | grep -E '^(Creator|Producer)' $ pdfinfo Food\ safety\ Kenya\ fruits.pdf | grep -E '^(Creator|Producer)'
Creator: Microsoft® Word 2016 Creator: Microsoft® Word 2016
Producer: Microsoft® Word 2016 Producer: Microsoft® Word 2016
</code></pre></li> </code></pre><ul>
<li>And the one that works was created with Office 365:</li>
<li><p>And the one that works was created with Office 365:</p> </ul>
<pre><code>$ pdfinfo Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf | grep -E '^(Creator|Producer)' <pre><code>$ pdfinfo Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf | grep -E '^(Creator|Producer)'
Creator: Microsoft® Word for Office 365 Creator: Microsoft® Word for Office 365
Producer: Microsoft® Word for Office 365 Producer: Microsoft® Word for Office 365
</code></pre></li> </code></pre><ul>
<li>I remembered an old technique I was using to generate thumbnails in 2015 using Inkscape followed by ImageMagick or GraphicsMagick:</li>
<li><p>I remembered an old technique I was using to generate thumbnails in 2015 using Inkscape followed by ImageMagick or GraphicsMagick:</p> </ul>
<pre><code>$ inkscape Food\ safety\ Kenya\ fruits.pdf -z --export-dpi=72 --export-area-drawing --export-png='cover.png' <pre><code>$ inkscape Food\ safety\ Kenya\ fruits.pdf -z --export-dpi=72 --export-area-drawing --export-png='cover.png'
$ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</code></pre></li> </code></pre><ul>
<li>I've tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</li>
<li><p>I&rsquo;ve tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</p></li> <li>In the end I tried one last time to just apply without registering and it was apparently successful</li>
<li>Testing DSpace 5.8 (<code>5_x-prod</code> branch) in an Ubuntu 18.04 VM with Tomcat 8.5 and had some issues:
<li><p>In the end I tried one last time to just apply without registering and it was apparently successful</p></li>
<li><p>Testing DSpace 5.8 (<code>5_x-prod</code> branch) in an Ubuntu 18.04 VM with Tomcat 8.5 and had some issues:</p>
<ul> <ul>
<li>JSPUI shows an internal error (log shows something about tag cloud, though, so might be unrelated)</li> <li>JSPUI shows an internal error (log shows something about tag cloud, though, so might be unrelated)</li>
<li>Atmire Listings and Reports, which use JSPUI, asks you to log in again and then doesn&rsquo;t work</li> <li>Atmire Listings and Reports, which use JSPUI, asks you to log in again and then doesn't work</li>
<li>Content and Usage Analysis doesn&rsquo;t show up in the sidebar after logging in</li> <li>Content and Usage Analysis doesn't show up in the sidebar after logging in</li>
<li>I can navigate to <a href="https://dspacetest.cgiar.org/atmire/reporting-suite/usage-graph-editor">/atmire/reporting-suite/usage-graph-editor</a>, but it&rsquo;s only the Atmire theme and a &ldquo;page not found&rdquo; message</li> <li>I can navigate to <a href="https://dspacetest.cgiar.org/atmire/reporting-suite/usage-graph-editor">/atmire/reporting-suite/usage-graph-editor</a>, but it's only the Atmire theme and a &ldquo;page not found&rdquo; message</li>
<li>Related messages from dspace.log:</li>
<li><p>Related messages from dspace.log:</p> </ul>
</li>
</ul>
<pre><code>2018-12-03 15:44:00,030 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found <pre><code>2018-12-03 15:44:00,030 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
2018-12-03 15:44:03,390 ERROR com.atmire.app.webui.servlet.ExportServlet @ Error converter plugin not found: interface org.infoCon.ConverterPlugin 2018-12-03 15:44:03,390 ERROR com.atmire.app.webui.servlet.ExportServlet @ Error converter plugin not found: interface org.infoCon.ConverterPlugin
... ...
2018-12-03 15:45:01,667 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-listing-and-reports not found 2018-12-03 15:45:01,667 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-listing-and-reports not found
</code></pre></li> </code></pre><ul>
</ul></li> <li>I tested it on my local environment with Tomcat 8.5.34 and the JSPUI application still has an error (again, the logs show something about tag cloud, so be unrelated), and the Listings and Reports still asks you to log in again, despite already being logged in in XMLUI, but does appear to work (I generated a report and exported a PDF)</li>
<li>I think the errors about missing Atmire components must be important, here on my local machine as well (though not the one about atmire-listings-and-reports):</li>
<li><p>I tested it on my local environment with Tomcat 8.5.34 and the JSPUI application still has an error (again, the logs show something about tag cloud, so be unrelated), and the Listings and Reports still asks you to log in again, despite already being logged in in XMLUI, but does appear to work (I generated a report and exported a PDF)</p></li>
<li><p>I think the errors about missing Atmire components must be important, here on my local machine as well (though not the one about atmire-listings-and-reports):</p>
<pre><code>2018-12-03 16:44:00,009 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
</code></pre></li>
<li><p>This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness&hellip;?</p></li>
</ul> </ul>
<pre><code>2018-12-03 16:44:00,009 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
<h2 id="2018-12-04">2018-12-04</h2> </code></pre><ul>
<li>This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness&hellip;?</li>
</ul>
<h2 id="20181204">2018-12-04</h2>
<ul> <ul>
<li><p>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here&rsquo;s a list of the top users at the time and throughout the day:</p> <li>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here's a list of the top users at the time and throughout the day:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018:1(5|6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018:1(5|6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
225 40.77.167.142 225 40.77.167.142
226 66.249.64.63 226 66.249.64.63
232 46.101.86.248 232 46.101.86.248
285 45.5.186.2 285 45.5.186.2
333 54.70.40.11 333 54.70.40.11
411 193.29.13.85 411 193.29.13.85
476 34.218.226.147 476 34.218.226.147
962 66.249.70.27 962 66.249.70.27
1193 35.237.175.180 1193 35.237.175.180
1450 2a01:4f8:140:3192::2 1450 2a01:4f8:140:3192::2
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1141 207.46.13.57 1141 207.46.13.57
1299 197.210.168.174 1299 197.210.168.174
1341 54.70.40.11 1341 54.70.40.11
1429 40.77.167.142 1429 40.77.167.142
1528 34.218.226.147 1528 34.218.226.147
1973 66.249.70.27 1973 66.249.70.27
2079 50.116.102.77 2079 50.116.102.77
2494 78.46.79.71 2494 78.46.79.71
3210 2a01:4f8:140:3192::2 3210 2a01:4f8:140:3192::2
4190 35.237.175.180 4190 35.237.175.180
</code></pre></li> </code></pre><ul>
<li><code>35.237.175.180</code> is known to us (CCAFS?), and I've already added it to the list of bot IPs in nginx, which appears to be working:</li>
<li><p><code>35.237.175.180</code> is known to us (CCAFS?), and I&rsquo;ve already added it to the list of bot IPs in nginx, which appears to be working:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03
4772 4772
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03 | sort | uniq | wc -l
630 630
</code></pre></li> </code></pre><ul>
<li>I haven't seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</li>
<li><p>I haven&rsquo;t seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</p> </ul>
<pre><code>Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/) <pre><code>Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
</code></pre></li> </code></pre><ul>
<li>At least it seems the Tomcat Crawler Session Manager Valve is working to re-use the common bot XMLUI sessions:</li>
<li><p>At least it seems the Tomcat Crawler Session Manager Valve is working to re-use the common bot XMLUI sessions:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03
5111 5111
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03 | sort | uniq | wc -l
419 419
</code></pre></li> </code></pre><ul>
<li><code>78.46.79.71</code> is another host on Hetzner with the following user agent:</li>
<li><p><code>78.46.79.71</code> is another host on Hetzner with the following user agent:</p> </ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 <pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre></li> </code></pre><ul>
<li>This is not the first time a host on Hetzner has used a &ldquo;normal&rdquo; user agent to make thousands of requests</li>
<li><p>This is not the first time a host on Hetzner has used a &ldquo;normal&rdquo; user agent to make thousands of requests</p></li> <li>At least it is re-using its Tomcat sessions somehow:</li>
</ul>
<li><p>At least it is re-using its Tomcat sessions somehow:</p>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
2044 2044
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03 | sort | uniq | wc -l
1 1
</code></pre></li> </code></pre><ul>
<li>In other news, it's good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</li>
<li><p>In other news, it&rsquo;s good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2018/12/postgres_connections_db-month.png" alt="PostgreSQL connections day"></p>
<p><img src="/cgspace-notes/2018/12/postgres_connections_db-month.png" alt="PostgreSQL connections day" /></p> <h2 id="20181205">2018-12-05</h2>
<h2 id="2018-12-05">2018-12-05</h2>
<ul> <ul>
<li>Discuss RSS issues with IWMI and WLE people</li> <li>Discuss RSS issues with IWMI and WLE people</li>
</ul> </ul>
<h2 id="20181206">2018-12-06</h2>
<h2 id="2018-12-06">2018-12-06</h2>
<ul> <ul>
<li>Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night</li> <li>Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night</li>
<li>I looked in the logs and there's nothing particular going on:</li>
<li><p>I looked in the logs and there&rsquo;s nothing particular going on:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1225 157.55.39.177 1225 157.55.39.177
1240 207.46.13.12 1240 207.46.13.12
1261 207.46.13.101 1261 207.46.13.101
1411 207.46.13.157 1411 207.46.13.157
1529 34.218.226.147 1529 34.218.226.147
2085 50.116.102.77 2085 50.116.102.77
3334 2a01:7e00::f03c:91ff:fe0a:d645 3334 2a01:7e00::f03c:91ff:fe0a:d645
3733 66.249.70.27 3733 66.249.70.27
3815 35.237.175.180 3815 35.237.175.180
7669 54.70.40.11 7669 54.70.40.11
</code></pre></li> </code></pre><ul>
<li><code>54.70.40.11</code> is some new bot with the following user agent:</li>
<li><p><code>54.70.40.11</code> is some new bot with the following user agent:</p> </ul>
<pre><code>Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler) <pre><code>Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
</code></pre></li> </code></pre><ul>
<li>But Tomcat is forcing them to re-use their Tomcat sessions with the Crawler Session Manager valve:</li>
<li><p>But Tomcat is forcing them to re-use their Tomcat sessions with the Crawler Session Manager valve:</p> </ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05 <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
6980 6980
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05 | sort | uniq | wc -l
1156 1156
</code></pre></li> </code></pre><ul>
<li><code>2a01:7e00::f03c:91ff:fe0a:d645</code> appears to be the CKM dev server where Danny is testing harvesting via Drupal</li>
<li><p><code>2a01:7e00::f03c:91ff:fe0a:d645</code> appears to be the CKM dev server where Danny is testing harvesting via Drupal</p></li> <li>It seems they are hitting the XMLUI's OpenSearch a bit, but mostly on the REST API so no issues here yet</li>
<li><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve's regex so that's good!</li>
<li><p>It seems they are hitting the XMLUI&rsquo;s OpenSearch a bit, but mostly on the REST API so no issues here yet</p></li>
<li><p><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve&rsquo;s regex so that&rsquo;s good!</p></li>
</ul> </ul>
<h2 id="20181210">2018-12-10</h2>
<h2 id="2018-12-10">2018-12-10</h2>
<ul> <ul>
<li>I ran into Mia Signs in Addis and we discussed Altmetric as well as RSS feeds again <li>I ran into Mia Signs in Addis and we discussed Altmetric as well as RSS feeds again
<ul> <ul>
<li>We came up with an <a href="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:Water,+Land+and+Ecosystems&amp;sort_by=3&amp;order=DESC">OpenSearch query for all items tagged with the WLE CRP subject</a> (where the <code>sort_by=3</code> parameter is the accession date, as configured in <code>dspace.cfg</code>)</li> <li>We came up with an <a href="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:Water,+Land+and+Ecosystems&amp;sort_by=3&amp;order=DESC">OpenSearch query for all items tagged with the WLE CRP subject</a> (where the <code>sort_by=3</code> parameter is the accession date, as configured in <code>dspace.cfg</code>)</li>
<li>About Altmetric she was wondering why some low-ranking items of theirs do not have the Handle/DOI relationship, but high-ranking ones do</li> <li>About Altmetric she was wondering why some low-ranking items of theirs do not have the Handle/DOI relationship, but high-ranking ones do</li>
<li>It sounds kinda crazy, but she said when she talked to Altmetric about their Twitter harvesting they said their coverage is not perfect, so it might be some kinda prioritization thing where they only do it for popular items?</li> <li>It sounds kinda crazy, but she said when she talked to Altmetric about their Twitter harvesting they said their coverage is not perfect, so it might be some kinda prioritization thing where they only do it for popular items?</li>
<li>I am testing this by <a href="https://twitter.com/mralanorth/status/1072153586342211584">tweeting</a> one <a href="https://cgspace.cgiar.org/handle/10568/98380">WLE item from CGSpace</a> that currently has no Altmetric score</li> <li>I am testing this by <a href="https://twitter.com/mralanorth/status/1072153586342211584">tweeting</a> one <a href="https://cgspace.cgiar.org/handle/10568/98380">WLE item from CGSpace</a> that currently has no Altmetric score</li>
<li>Interestingly, after about an hour I see it has already been <a href="https://cgspace.altmetric.com/details/50160871/twitter">picked up by Altmetric</a> and has my tweet as well as some other tweet from over a month ago&hellip;</li> <li>Interestingly, after about an hour I see it has already been <a href="https://cgspace.altmetric.com/details/50160871/twitter">picked up by Altmetric</a> and has my tweet as well as some other tweet from over a month ago&hellip;</li>
<li>I <a href="https://twitter.com/mralanorth/status/1072198292182892545">tweeted a link to the item&rsquo;s DOI</a> to see if Altmetric will notice it, hopefully associated with the Handle I tweeted earlier</li> <li>I <a href="https://twitter.com/mralanorth/status/1072198292182892545">tweeted a link to the item's DOI</a> to see if Altmetric will notice it, hopefully associated with the Handle I tweeted earlier</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-12-11">2018-12-11</h2> </ul>
<h2 id="20181211">2018-12-11</h2>
<ul> <ul>
<li>I checked the <a href="https://twitter.com/mralanorth/status/1072198292182892545">latest tweet of the IWMI item with a DOI</a> and it was <a href="https://cgspace.altmetric.com/details/50160871/twitter">picked up by Altmetric</a> <li>I checked the <a href="https://twitter.com/mralanorth/status/1072198292182892545">latest tweet of the IWMI item with a DOI</a> and it was <a href="https://cgspace.altmetric.com/details/50160871/twitter">picked up by Altmetric</a>
<ul> <ul>
<li>There is one <a href="twitter.com/ArboNews/statuses/1055036747787223042">curious other tweet</a> from another user where they used the NCBI link, and that is also associated with our Handle on Altmetric</li> <li>There is one <a href="twitter.com/ArboNews/statuses/1055036747787223042">curious other tweet</a> from another user where they used the NCBI link, and that is also associated with our Handle on Altmetric</li>
<li>So that means Altmetric is picking up the DOI from the NCBI metadata and making the association properly</li> <li>So that means Altmetric is picking up the DOI from the NCBI metadata and making the association properly</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2018-12-13">2018-12-13</h2> </ul>
<h2 id="20181213">2018-12-13</h2>
<ul> <ul>
<li>Oh this is very interesting: <a href="https://digitalarchive.worldfishcenter.org">WorldFish&rsquo;s repository is live now</a></li> <li>Oh this is very interesting: <a href="https://digitalarchive.worldfishcenter.org">WorldFish's repository is live now</a></li>
<li>It&rsquo;s running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least</li> <li>It's running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least</li>
<li>Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc&rsquo;s advice to <em>not</em> use Handles!)</li> <li>Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc's advice to <em>not</em> use Handles!)</li>
<li>Did some coordination work on the hotel bookings for the January AReS workshop in Amman</li> <li>Did some coordination work on the hotel bookings for the January AReS workshop in Amman</li>
</ul> </ul>
<h2 id="20181217">2018-12-17</h2>
<h2 id="2018-12-17">2018-12-17</h2>
<ul> <ul>
<li>Linode alerted me twice today that the load on CGSpace (linode18) was very high</li> <li>Linode alerted me twice today that the load on CGSpace (linode18) was very high</li>
<li>Looking at the nginx logs I see a few new IPs in the top 10:</li>
<li><p>Looking at the nginx logs I see a few new IPs in the top 10:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;17/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
927 157.55.39.81
975 54.70.40.11
2090 50.116.102.77
2121 66.249.66.219
3811 35.237.175.180
4590 205.186.128.185
4590 70.32.83.92
5436 2a01:4f8:173:1e85::2
5438 143.233.227.216
6706 94.71.244.172
</code></pre></li>
<li><p><code>94.71.244.172</code> and <code>143.233.227.216</code> are both in Greece and use the following user agent:</p>
<pre><code>Mozilla/3.0 (compatible; Indy Library)
</code></pre></li>
<li><p>I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used</p></li>
<li><p><code>2a01:4f8:173:1e85::2</code> is some new bot called <code>BLEXBot/1.0</code> which should be matching the existing &ldquo;bot&rdquo; pattern in the Tomcat Crawler Session Manager regex</p></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;17/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<h2 id="2018-12-18">2018-12-18</h2> 927 157.55.39.81
975 54.70.40.11
2090 50.116.102.77
2121 66.249.66.219
3811 35.237.175.180
4590 205.186.128.185
4590 70.32.83.92
5436 2a01:4f8:173:1e85::2
5438 143.233.227.216
6706 94.71.244.172
</code></pre><ul>
<li><code>94.71.244.172</code> and <code>143.233.227.216</code> are both in Greece and use the following user agent:</li>
</ul>
<pre><code>Mozilla/3.0 (compatible; Indy Library)
</code></pre><ul>
<li>I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used</li>
<li><code>2a01:4f8:173:1e85::2</code> is some new bot called <code>BLEXBot/1.0</code> which should be matching the existing &ldquo;bot&rdquo; pattern in the Tomcat Crawler Session Manager regex</li>
</ul>
<h2 id="20181218">2018-12-18</h2>
<ul> <ul>
<li>Open a <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657">ticket</a> with Atmire to ask them to prepare the Metadata Quality Module for our DSpace 5.8 code</li> <li>Open a <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657">ticket</a> with Atmire to ask them to prepare the Metadata Quality Module for our DSpace 5.8 code</li>
</ul> </ul>
<h2 id="20181219">2018-12-19</h2>
<h2 id="2018-12-19">2018-12-19</h2>
<ul> <ul>
<li>Update Atmire Listings and Reports to add the journal title (<code>dc.source</code>) to bibliography and update example bibliography values (<a href="https://github.com/ilri/DSpace/pull/405">#405</a>)</li> <li>Update Atmire Listings and Reports to add the journal title (<code>dc.source</code>) to bibliography and update example bibliography values (<a href="https://github.com/ilri/DSpace/pull/405">#405</a>)</li>
</ul> </ul>
<h2 id="20181220">2018-12-20</h2>
<h2 id="2018-12-20">2018-12-20</h2>
<ul> <ul>
<li><p>Testing compression of PostgreSQL backups with xz and gzip:</p> <li>Testing compression of PostgreSQL backups with xz and gzip:</li>
</ul>
<pre><code>$ time xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz <pre><code>$ time xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz
xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz 48.29s user 0.19s system 99% cpu 48.579 total xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz 48.29s user 0.19s system 99% cpu 48.579 total
$ time gzip -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.gz $ time gzip -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.gz
@ -556,41 +477,32 @@ $ ls -lh cgspace_2018-12-19.backup*
-rw-r--r-- 1 aorth aorth 96M Dec 19 02:15 cgspace_2018-12-19.backup -rw-r--r-- 1 aorth aorth 96M Dec 19 02:15 cgspace_2018-12-19.backup
-rw-r--r-- 1 aorth aorth 94M Dec 20 11:36 cgspace_2018-12-19.backup.gz -rw-r--r-- 1 aorth aorth 94M Dec 20 11:36 cgspace_2018-12-19.backup.gz
-rw-r--r-- 1 aorth aorth 93M Dec 20 11:35 cgspace_2018-12-19.backup.xz -rw-r--r-- 1 aorth aorth 93M Dec 20 11:35 cgspace_2018-12-19.backup.xz
</code></pre></li> </code></pre><ul>
<li>Looks like it's really not worth it&hellip;</li>
<li><p>Looks like it&rsquo;s really not worth it&hellip;</p></li> <li>Peter pointed out that Discovery filters for CTA subjects on item pages were not working</li>
<li>It looks like there were some mismatches in the Discovery index names and the XMLUI configuration, so I fixed them (<a href="https://github.com/ilri/DSpace/pull/406">#406</a>)</li>
<li><p>Peter pointed out that Discovery filters for CTA subjects on item pages were not working</p></li> <li>Peter asked if we could create a controlled vocabulary for publishers (<code>dc.publisher</code>)</li>
<li>I see we have about 3500 distinct publishers:</li>
<li><p>It looks like there were some mismatches in the Discovery index names and the XMLUI configuration, so I fixed them (<a href="https://github.com/ilri/DSpace/pull/406">#406</a>)</p></li> </ul>
<li><p>Peter asked if we could create a controlled vocabulary for publishers (<code>dc.publisher</code>)</p></li>
<li><p>I see we have about 3500 distinct publishers:</p>
<pre><code># SELECT COUNT(DISTINCT(text_value)) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=39; <pre><code># SELECT COUNT(DISTINCT(text_value)) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=39;
count count
------- -------
3522 3522
(1 row) (1 row)
</code></pre></li> </code></pre><ul>
<li>I reverted the metadata changes related to &ldquo;Unrestricted Access&rdquo; and &ldquo;Restricted Access&rdquo; on DSpace Test because we're not pushing forward with the new status terms for now</li>
<li><p>I reverted the metadata changes related to &ldquo;Unrestricted Access&rdquo; and &ldquo;Restricted Access&rdquo; on DSpace Test because we&rsquo;re not pushing forward with the new status terms for now</p></li> <li>Purge remaining Oracle Java 8 stuff from CGSpace (linode18) since we migrated to OpenJDK a few months ago:</li>
</ul>
<li><p>Purge remaining Oracle Java 8 stuff from CGSpace (linode18) since we migrated to OpenJDK a few months ago:</p>
<pre><code># dpkg -P oracle-java8-installer oracle-java8-set-default <pre><code># dpkg -P oracle-java8-installer oracle-java8-set-default
</code></pre></li> </code></pre><ul>
<li>Update usage rights on CGSpace as we agreed with Maria Garruccio and Peter last month:</li>
<li><p>Update usage rights on CGSpace as we agreed with Maria Garruccio and Peter last month:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-11-27-update-rights.csv -f dc.rights -t correct -m 53 -db dspace -u dspace -p 'fuu' -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-11-27-update-rights.csv -f dc.rights -t correct -m 53 -db dspace -u dspace -p 'fuu' -d
Connected to database. Connected to database.
Fixed 466 occurences of: Copyrighted; Any re-use allowed Fixed 466 occurences of: Copyrighted; Any re-use allowed
</code></pre></li> </code></pre><ul>
<li>Upgrade PostgreSQL on CGSpace (linode18) from 9.5 to 9.6:</li>
<li><p>Upgrade PostgreSQL on CGSpace (linode18) from 9.5 to 9.6:</p> </ul>
<pre><code># apt install postgresql-9.6 postgresql-client-9.6 postgresql-contrib-9.6 postgresql-server-dev-9.6 <pre><code># apt install postgresql-9.6 postgresql-client-9.6 postgresql-contrib-9.6 postgresql-server-dev-9.6
# pg_ctlcluster 9.5 main stop # pg_ctlcluster 9.5 main stop
# tar -cvzpf var-lib-postgresql-9.5.tar.gz /var/lib/postgresql/9.5 # tar -cvzpf var-lib-postgresql-9.5.tar.gz /var/lib/postgresql/9.5
@ -600,72 +512,60 @@ Fixed 466 occurences of: Copyrighted; Any re-use allowed
# pg_upgradecluster 9.5 main # pg_upgradecluster 9.5 main
# pg_dropcluster 9.5 main # pg_dropcluster 9.5 main
# dpkg -l | grep postgresql | grep 9.5 | awk '{print $2}' | xargs dpkg -r # dpkg -l | grep postgresql | grep 9.5 | awk '{print $2}' | xargs dpkg -r
</code></pre></li> </code></pre><ul>
<li>I've been running PostgreSQL 9.6 for months on my local development and public DSpace Test (linode19) environments</li>
<li><p>I&rsquo;ve been running PostgreSQL 9.6 for months on my local development and public DSpace Test (linode19) environments</p></li> <li>Run all system updates on CGSpace (linode18) and restart the server</li>
<li>Try to run the DSpace cleanup script on CGSpace (linode18), but I get some errors about foreign key constraints:</li>
<li><p>Run all system updates on CGSpace (linode18) and restart the server</p></li> </ul>
<li><p>Try to run the DSpace cleanup script on CGSpace (linode18), but I get some errors about foreign key constraints:</p>
<pre><code>$ dspace cleanup -v <pre><code>$ dspace cleanup -v
- Deleting bitstream information (ID: 158227) - Deleting bitstream information (ID: 158227)
- Deleting bitstream record from database (ID: 158227) - Deleting bitstream record from database (ID: 158227)
Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(158227) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(158227) is still referenced from table &quot;bundle&quot;.
... ...
</code></pre></li> </code></pre><ul>
<li>As always, the solution is to delete those IDs manually in PostgreSQL:</li>
<li><p>As always, the solution is to delete those IDs manually in PostgreSQL:</p> </ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (158227, 158251);' <pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (158227, 158251);'
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><ul>
<li>After all that I started a full Discovery reindex to get the index name changes and rights updates</li>
<li><p>After all that I started a full Discovery reindex to get the index name changes and rights updates</p></li>
</ul> </ul>
<h2 id="20181229">2018-12-29</h2>
<h2 id="2018-12-29">2018-12-29</h2>
<ul> <ul>
<li>CGSpace went down today for a few minutes while I was at dinner and I quickly restarted Tomcat</li> <li>CGSpace went down today for a few minutes while I was at dinner and I quickly restarted Tomcat</li>
<li>The top IP addresses as of this evening are:</li>
<li><p>The top IP addresses as of this evening are:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;29/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
963 40.77.167.152
987 35.237.175.180
1062 40.77.167.55
1464 66.249.66.223
1660 34.218.226.147
1801 70.32.83.92
2005 50.116.102.77
3218 66.249.66.219
4608 205.186.128.185
5585 54.70.40.11
</code></pre></li>
<li><p>And just around the time of the alert:</p>
<pre><code># zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz | grep -E &quot;29/Dec/2018:1(6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
115 66.249.66.223
118 207.46.13.14
123 34.218.226.147
133 95.108.181.88
137 35.237.175.180
164 66.249.66.219
260 157.55.39.59
291 40.77.167.55
312 207.46.13.129
1253 54.70.40.11
</code></pre></li>
<li><p>All these look ok (<code>54.70.40.11</code> is known to us from earlier this month and should be reusing its Tomcat sessions)</p></li>
<li><p>So I&rsquo;m not sure what was going on last night&hellip;</p></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;29/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<!-- vim: set sw=2 ts=2: --> 963 40.77.167.152
987 35.237.175.180
1062 40.77.167.55
1464 66.249.66.223
1660 34.218.226.147
1801 70.32.83.92
2005 50.116.102.77
3218 66.249.66.219
4608 205.186.128.185
5585 54.70.40.11
</code></pre><ul>
<li>And just around the time of the alert:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz | grep -E &quot;29/Dec/2018:1(6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
115 66.249.66.223
118 207.46.13.14
123 34.218.226.147
133 95.108.181.88
137 35.237.175.180
164 66.249.66.219
260 157.55.39.59
291 40.77.167.55
312 207.46.13.129
1253 54.70.40.11
</code></pre><ul>
<li>All these look ok (<code>54.70.40.11</code> is known to us from earlier this month and should be reusing its Tomcat sessions)</li>
<li>So I'm not sure what was going on last night&hellip;</li>
</ul>
<!-- raw HTML omitted -->

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -8,11 +8,9 @@
<meta property="og:title" content="May, 2019" /> <meta property="og:title" content="May, 2019" />
<meta property="og:description" content="2019-05-01 <meta property="og:description" content="2019-05-01
Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace
A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
Apparently if the item is in the workflowitem table it is submitted to a workflow Apparently if the item is in the workflowitem table it is submitted to a workflow
And if it is in the workspaceitem table it is in the pre-submitted state And if it is in the workspaceitem table it is in the pre-submitted state
@ -22,7 +20,6 @@ The item seems to be in a pre-submitted state, so I tried to delete it from ther
dspace=# DELETE FROM workspaceitem WHERE item_id=74648; dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present&hellip; But after this I tried to delete the item from the XMLUI and it is still present&hellip;
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -34,11 +31,9 @@ But after this I tried to delete the item from the XMLUI and it is still present
<meta name="twitter:title" content="May, 2019"/> <meta name="twitter:title" content="May, 2019"/>
<meta name="twitter:description" content="2019-05-01 <meta name="twitter:description" content="2019-05-01
Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace
A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
Apparently if the item is in the workflowitem table it is submitted to a workflow Apparently if the item is in the workflowitem table it is submitted to a workflow
And if it is in the workspaceitem table it is in the pre-submitted state And if it is in the workspaceitem table it is in the pre-submitted state
@ -48,10 +43,9 @@ The item seems to be in a pre-submitted state, so I tried to delete it from ther
dspace=# DELETE FROM workspaceitem WHERE item_id=74648; dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present&hellip; But after this I tried to delete the item from the XMLUI and it is still present&hellip;
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -61,7 +55,7 @@ But after this I tried to delete the item from the XMLUI and it is still present
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "May, 2019", "headline": "May, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-05\/", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-05\/",
"wordCount": "3215", "wordCount": "3190",
"datePublished": "2019-05-01T07:37:43+03:00", "datePublished": "2019-05-01T07:37:43+03:00",
"dateModified": "2019-10-28T13:39:25+02:00", "dateModified": "2019-10-28T13:39:25+02:00",
"author": { "author": {
@ -132,183 +126,154 @@ But after this I tried to delete the item from the XMLUI and it is still present
</p> </p>
</header> </header>
<h2 id="2019-05-01">2019-05-01</h2> <h2 id="20190501">2019-05-01</h2>
<ul> <ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li> <li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items <li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul> <ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li> <li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li> <li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul></li> </ul>
</li>
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p> <li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648; <pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
</ul> </ul>
<ul> <ul>
<li><p>I managed to delete the problematic item from the database</p> <li>I managed to delete the problematic item from the database
<ul> <ul>
<li>First I deleted the item&rsquo;s bitstream in XMLUI and then ran <code>dspace cleanup -v</code> to remove it from the assetstore</li> <li>First I deleted the item's bitstream in XMLUI and then ran <code>dspace cleanup -v</code> to remove it from the assetstore</li>
<li>Then I ran the following SQL:</li>
<li><p>Then I ran the following SQL:</p> </ul>
</li>
</ul>
<pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_id=74648; <pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_id=74648;
dspace=# DELETE FROM workspaceitem WHERE item_id=74648; dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
dspace=# DELETE FROM item WHERE item_id=74648; dspace=# DELETE FROM item WHERE item_id=74648;
</code></pre></li> </code></pre><ul>
</ul></li> <li>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API's <code>/items/find-by-metadata-value</code> endpoint
<li><p>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API&rsquo;s <code>/items/find-by-metadata-value</code> endpoint</p>
<ul> <ul>
<li><p>Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:</p> <li>Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:</li>
</ul>
</li>
</ul>
<pre><code>$ curl -f -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;:&quot;cg.subject.cpwf&quot;, &quot;value&quot;:&quot;WATER MANAGEMENT&quot;,&quot;language&quot;: &quot;en_US&quot;}' <pre><code>$ curl -f -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;:&quot;cg.subject.cpwf&quot;, &quot;value&quot;:&quot;WATER MANAGEMENT&quot;,&quot;language&quot;: &quot;en_US&quot;}'
curl: (22) The requested URL returned error: 401 Unauthorized curl: (22) The requested URL returned error: 401 Unauthorized
</code></pre></li> </code></pre><ul>
</ul></li> <li>The DSpace log shows the item ID (because I modified the error text):</li>
</ul>
<li><p>The DSpace log shows the item ID (because I modified the error text):</p>
<pre><code>2019-05-01 11:41:11,069 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item(id=77708)! <pre><code>2019-05-01 11:41:11,069 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item(id=77708)!
</code></pre></li> </code></pre><ul>
<li>If I delete that one I get another, making the list of item IDs so far:
<li><p>If I delete that one I get another, making the list of item IDs so far:</p>
<ul> <ul>
<li>74648</li> <li>74648</li>
<li>77708</li> <li>77708</li>
<li>85079</li> <li>85079</li>
</ul></li>
<li><p>Some are in the <code>workspaceitem</code> table (pre-submission), others are in the <code>workflowitem</code> table (submitted), and others are actually approved, but withdrawn&hellip;</p>
<ul>
<li>This is actually a worthless exercise because the real issue is that the <code>/items/find-by-metadata-value</code> endpoint is simply designed flawed and shouldn&rsquo;t be fatally erroring when the search returns items the user doesn&rsquo;t have permission to access</li>
<li>It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn&rsquo;t actually fix the problem because some items are <em>submitted</em> but <em>withdrawn</em>, so they actually have handles and everything</li>
<li>I think the solution is to recommend people don&rsquo;t use the <code>/items/find-by-metadata-value</code> endpoint</li>
</ul></li>
<li><p>CIP is asking about embedding PDF thumbnail images in their RSS feeds again</p>
<ul>
<li>They asked in 2018-09 as well and I told them it wasn&rsquo;t possible</li>
<li>To make sure, I looked at <a href="https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds">the documentation for RSS media feeds</a> and tried it, but couldn&rsquo;t get it to work</li>
<li>It seems to be geared towards iTunes and Podcasts&hellip; I dunno</li>
</ul></li>
<li><p>CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace</p>
<ul>
<li><p>I told them to use the REST API like (where <code>1179</code> is the id of the RTB journal articles collection):</p>
<pre><code>https://cgspace.cgiar.org/rest/collections/1179/items?limit=812&amp;expand=metadata
</code></pre></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-03">2019-05-03</h2> <li>Some are in the <code>workspaceitem</code> table (pre-submission), others are in the <code>workflowitem</code> table (submitted), and others are actually approved, but withdrawn&hellip;
<ul> <ul>
<li><p>A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks</p> <li>This is actually a worthless exercise because the real issue is that the <code>/items/find-by-metadata-value</code> endpoint is simply designed flawed and shouldn't be fatally erroring when the search returns items the user doesn't have permission to access</li>
<li>It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn't actually fix the problem because some items are <em>submitted</em> but <em>withdrawn</em>, so they actually have handles and everything</li>
<li>I think the solution is to recommend people don't use the <code>/items/find-by-metadata-value</code> endpoint</li>
</ul>
</li>
<li>CIP is asking about embedding PDF thumbnail images in their RSS feeds again
<ul> <ul>
<li><p>I checked the <code>dspace test-email</code> script on CGSpace and they are indeed failing:</p> <li>They asked in 2018-09 as well and I told them it wasn't possible</li>
<li>To make sure, I looked at <a href="https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds">the documentation for RSS media feeds</a> and tried it, but couldn't get it to work</li>
<li>It seems to be geared towards iTunes and Podcasts&hellip; I dunno</li>
</ul>
</li>
<li>CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace
<ul>
<li>I told them to use the REST API like (where <code>1179</code> is the id of the RTB journal articles collection):</li>
</ul>
</li>
</ul>
<pre><code>https://cgspace.cgiar.org/rest/collections/1179/items?limit=812&amp;expand=metadata
</code></pre><h2 id="20190503">2019-05-03</h2>
<ul>
<li>A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks
<ul>
<li>I checked the <code>dspace test-email</code> script on CGSpace and they are indeed failing:</li>
</ul>
</li>
</ul>
<pre><code>$ dspace test-email <pre><code>$ dspace test-email
About to send test email: About to send test email:
- To: woohoo@cgiar.org - To: woohoo@cgiar.org
- Subject: DSpace test email - Subject: DSpace test email
- Server: smtp.office365.com - Server: smtp.office365.com
Error sending email: Error sending email:
- Error: javax.mail.AuthenticationFailedException - Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance. Please see the DSpace documentation for assistance.
</code></pre></li> </code></pre><ul>
</ul></li> <li>I will ask ILRI ICT to reset the password
<li><p>I will ask ILRI ICT to reset the password</p>
<ul> <ul>
<li>They reset the password and I tested it on CGSpace</li> <li>They reset the password and I tested it on CGSpace</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-05">2019-05-05</h2> </ul>
<h2 id="20190505">2019-05-05</h2>
<ul> <ul>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li> <li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Merge changes into the <code>5_x-prod</code> branch of CGSpace: <li>Merge changes into the <code>5_x-prod</code> branch of CGSpace:
<ul> <ul>
<li>Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links (<a href="https://github.com/ilri/DSpace/pull/421">#421</a>)</li> <li>Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links (<a href="https://github.com/ilri/DSpace/pull/421">#421</a>)</li>
<li>Add new CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/pull/420">#420</a>)</li> <li>Add new CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/pull/420">#420</a>)</li>
<li>Add item ID to REST API error logging (<a href="https://github.com/ilri/DSpace/pull/422">#422</a>)</li> <li>Add item ID to REST API error logging (<a href="https://github.com/ilri/DSpace/pull/422">#422</a>)</li>
</ul></li> </ul>
</li>
<li>Re-deploy CGSpace from <code>5_x-prod</code> branch</li> <li>Re-deploy CGSpace from <code>5_x-prod</code> branch</li>
<li>Run all system updates on CGSpace (linode18) and reboot it</li> <li>Run all system updates on CGSpace (linode18) and reboot it</li>
<li>Tag version 1.1.0 of the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> (with Falcon 2.0.0) <li>Tag version 1.1.0 of the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> (with Falcon 2.0.0)
<ul> <ul>
<li>Deploy on DSpace Test</li> <li>Deploy on DSpace Test</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-06">2019-05-06</h2> </ul>
<h2 id="20190506">2019-05-06</h2>
<ul> <ul>
<li><p>Peter pointed out that Solr stats are only showing 2019 stats</p> <li>Peter pointed out that Solr stats are only showing 2019 stats
<ul> <ul>
<li><p>I looked at the Solr Admin UI and I see:</p> <li>I looked at the Solr Admin UI and I see:</li>
</ul>
</li>
</ul>
<pre><code>statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher <pre><code>statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
</code></pre></li> </code></pre><ul>
</ul></li> <li>As well as this error in the logs:</li>
</ul>
<li><p>As well as this error in the logs:</p>
<pre><code>Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock <pre><code>Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
</code></pre></li> </code></pre><ul>
<li>Strangely enough, I <em>do</em> see the statistics-2018, statistics-2017, etc cores in the Admin UI&hellip;</li>
<li><p>Strangely enough, I <em>do</em> see the statistics-2018, statistics-2017, etc cores in the Admin UI&hellip;</p></li> <li>I restarted Tomcat a few times (and even deleted all the Solr write locks) and at least five times there were issues loading one statistics core, causing the Atmire stats to be incomplete
<li><p>I restarted Tomcat a few times (and even deleted all the Solr write locks) and at least five times there were issues loading one statistics core, causing the Atmire stats to be incomplete</p>
<ul> <ul>
<li>Also, I tried to increase the <code>writeLockTimeout</code> in <code>solrconfig.xml</code> from the default of 1000ms to 10000ms</li> <li>Also, I tried to increase the <code>writeLockTimeout</code> in <code>solrconfig.xml</code> from the default of 1000ms to 10000ms</li>
<li>Eventually the Atmire stats started working, despite errors about &ldquo;Error opening new searcher&rdquo; in the Solr Admin UI</li> <li>Eventually the Atmire stats started working, despite errors about &ldquo;Error opening new searcher&rdquo; in the Solr Admin UI</li>
<li>I wrote to the dspace-tech mailing list again on the thread from March, 2019</li> <li>I wrote to the dspace-tech mailing list again on the thread from March, 2019</li>
</ul></li> </ul>
</li>
<li><p>There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load</p> <li>There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load
<ul> <ul>
<li>Looking at the Munin stats I see an exponential rise in DSpace XMLUI sessions, firewall activity, and PostgreSQL connections this morning:</li> <li>Looking at the Munin stats I see an exponential rise in DSpace XMLUI sessions, firewall activity, and PostgreSQL connections this morning:</li>
</ul></li>
</ul> </ul>
</li>
<p><img src="/cgspace-notes/2019/05/2019-05-06-jmx_dspace_sessions-day.png" alt="CGSpace XMLUI sessions day" /></p> </ul>
<p><img src="/cgspace-notes/2019/05/2019-05-06-jmx_dspace_sessions-day.png" alt="CGSpace XMLUI sessions day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-fw_conntrack-day.png" alt="linode18 firewall connections day" /></p> <p><img src="/cgspace-notes/2019/05/2019-05-06-fw_conntrack-day.png" alt="linode18 firewall connections day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-postgres_connections_db-day.png" alt="linode18 postgres connections day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-postgres_connections_db-day.png" alt="linode18 postgres connections day" /></p> <p><img src="/cgspace-notes/2019/05/2019-05-06-cpu-day.png" alt="linode18 CPU day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-cpu-day.png" alt="linode18 CPU day" /></p>
<ul> <ul>
<li><p>The number of unique sessions today is <em>ridiculously</em> high compared to the last few days considering it&rsquo;s only 12:30PM right now:</p> <li>The number of unique sessions today is <em>ridiculously</em> high compared to the last few days considering it's only 12:30PM right now:</li>
</ul>
<pre><code>$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l <pre><code>$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
101108 101108
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l
@ -321,10 +286,9 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc
7758 7758
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l
20528 20528
</code></pre></li> </code></pre><ul>
<li>The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:</li>
<li><p>The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:</p> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l <pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
7127 7127
# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l # zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
@ -337,10 +301,9 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc
1573 1573
# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l # zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
1410 1410
</code></pre></li> </code></pre><ul>
<li>Just this morning between the hours of 2 and 6 the number of unique sessions was <em>very</em> high compared to previous mornings:</li>
<li><p>Just this morning between the hours of 2 and 6 the number of unique sessions was <em>very</em> high compared to previous mornings:</p> </ul>
<pre><code>$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l <pre><code>$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
83650 83650
$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l $ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
@ -353,53 +316,45 @@ $ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -
2704 2704
$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
3699 3699
</code></pre></li> </code></pre><ul>
<li>Most of the requests were GETs:</li>
<li><p>Most of the requests were GETs:</p> </ul>
<pre><code># cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E &quot;(GET|HEAD|POST|PUT)&quot; | sort | uniq -c | sort -n <pre><code># cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E &quot;(GET|HEAD|POST|PUT)&quot; | sort | uniq -c | sort -n
1 PUT 1 PUT
98 POST 98 POST
2845 HEAD 2845 HEAD
98121 GET 98121 GET
</code></pre></li> </code></pre><ul>
<li>I'm not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?</li>
<li><p>I&rsquo;m not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?</p></li> <li>Looking again, I see 84,000 requests to <code>/handle</code> this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in <code>access.log</code>):</li>
</ul>
<li><p>Looking again, I see 84,000 requests to <code>/handle</code> this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in <code>access.log</code>):</p>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E &quot; /handle/[0-9]+/[0-9]+&quot; <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E &quot; /handle/[0-9]+/[0-9]+&quot;
84350 84350
</code></pre></li> </code></pre><ul>
<li>But it would be difficult to find a pattern for those requests because they cover 78,000 <em>unique</em> Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):</li>
<li><p>But it would be difficult to find a pattern for those requests because they cover 78,000 <em>unique</em> Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):</p> </ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E &quot; /handle/[0-9]+/[0-9]+ HTTP&quot; | sort | uniq | wc -l <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E &quot; /handle/[0-9]+/[0-9]+ HTTP&quot; | sort | uniq | wc -l
78104 78104
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E &quot; /handle/[0-9]+/[0-9]+/(discover|browse)&quot; | wc -l # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E &quot; /handle/[0-9]+/[0-9]+/(discover|browse)&quot; | wc -l
2492 2492
</code></pre></li> </code></pre><ul>
<li>In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:</li>
<li><p>In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:</p> </ul>
<pre><code># grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c <pre><code># grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c
3 2a01:7e00::f03c:91ff:fe0a:d645 3 2a01:7e00::f03c:91ff:fe0a:d645
113 63.32.242.35 113 63.32.242.35
</code></pre></li> </code></pre><ul>
<li>According to <a href="https://viewdns.info/reverseip/?host=63.32.242.35&amp;t=1">viewdns.info</a> that server belongs to Macaroni Brothers&rsquo;
<li><p>According to <a href="https://viewdns.info/reverseip/?host=63.32.242.35&amp;t=1">viewdns.info</a> that server belongs to Macaroni Brothers&rsquo;</p>
<ul> <ul>
<li>The user agent of their non-REST API requests from the same IP is Drupal</li> <li>The user agent of their non-REST API requests from the same IP is Drupal</li>
<li>This is one very good reason to limit REST API requests, and perhaps to enable caching via nginx</li> <li>This is one very good reason to limit REST API requests, and perhaps to enable caching via nginx</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-07">2019-05-07</h2> </ul>
<h2 id="20190507">2019-05-07</h2>
<ul> <ul>
<li><p>The total number of unique IPs on CGSpace yesterday was almost 14,000, which is several thousand higher than previous day totals:</p> <li>The total number of unique IPs on CGSpace yesterday was almost 14,000, which is several thousand higher than previous day totals:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '06/May/2019' | awk '{print $1}' | sort | uniq | wc -l <pre><code># zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '06/May/2019' | awk '{print $1}' | sort | uniq | wc -l
13969 13969
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '05/May/2019' | awk '{print $1}' | sort | uniq | wc -l # zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '05/May/2019' | awk '{print $1}' | sort | uniq | wc -l
@ -408,10 +363,9 @@ $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -
6229 6229
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '03/May/2019' | awk '{print $1}' | sort | uniq | wc -l # zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '03/May/2019' | awk '{print $1}' | sort | uniq | wc -l
8051 8051
</code></pre></li> </code></pre><ul>
<li>Total number of sessions yesterday was <em>much</em> higher compared to days last week:</li>
<li><p>Total number of sessions yesterday was <em>much</em> higher compared to days last week:</p> </ul>
<pre><code>$ cat dspace.log.2019-05-06 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l <pre><code>$ cat dspace.log.2019-05-06 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
144160 144160
$ cat dspace.log.2019-05-05 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l $ cat dspace.log.2019-05-05 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
@ -424,69 +378,57 @@ $ cat dspace.log.2019-05-02 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq |
26996 26996
$ cat dspace.log.2019-05-01 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l $ cat dspace.log.2019-05-01 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
61866 61866
</code></pre></li> </code></pre><ul>
<li>The usage statistics seem to agree that yesterday was crazy:</li>
<li><p>The usage statistics seem to agree that yesterday was crazy:</p></li>
</ul> </ul>
<p><img src="/cgspace-notes/2019/05/2019-05-07-atmire-usage-week.png" alt="Atmire Usage statistics spike 2019-05-06"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-07-atmire-usage-week.png" alt="Atmire Usage statistics spike 2019-05-06" /></p>
<ul> <ul>
<li>Sarah from RTB asked me about the RSS / XML link for the the CGIAR.org website again <li>Sarah from RTB asked me about the RSS / XML link for the the CGIAR.org website again
<ul> <ul>
<li>Apparently Sam Stacey is trying to add an RSS feed so the items get automatically syndicated to the CGIAR website</li> <li>Apparently Sam Stacey is trying to add an RSS feed so the items get automatically syndicated to the CGIAR website</li>
<li>I send her the link to the collection RSS feed</li> <li>I send her the link to the collection RSS feed</li>
</ul></li> </ul>
</li>
<li>Add requests cache to <code>resolve-addresses.py</code> script</li> <li>Add requests cache to <code>resolve-addresses.py</code> script</li>
</ul> </ul>
<h2 id="20190508">2019-05-08</h2>
<h2 id="2019-05-08">2019-05-08</h2>
<ul> <ul>
<li><p>A user said that CGSpace emails have stopped sending again</p> <li>A user said that CGSpace emails have stopped sending again
<ul> <ul>
<li><p>Indeed, the <code>dspace test-email</code> script is showing an authentication failure:</p> <li>Indeed, the <code>dspace test-email</code> script is showing an authentication failure:</li>
</ul>
</li>
</ul>
<pre><code>$ dspace test-email <pre><code>$ dspace test-email
About to send test email: About to send test email:
- To: wooooo@cgiar.org - To: wooooo@cgiar.org
- Subject: DSpace test email - Subject: DSpace test email
- Server: smtp.office365.com - Server: smtp.office365.com
Error sending email: Error sending email:
- Error: javax.mail.AuthenticationFailedException - Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance. Please see the DSpace documentation for assistance.
</code></pre></li> </code></pre><ul>
</ul></li> <li>I checked the settings and apparently I had updated it incorrectly last week after ICT reset the password</li>
<li>Help Moayad with certbot-auto for Let's Encrypt scripts on the new AReS server (linode20)</li>
<li><p>I checked the settings and apparently I had updated it incorrectly last week after ICT reset the password</p></li> <li>Normalize all <code>text_lang</code> values for metadata on CGSpace and DSpace Test (as I had tested last month):</li>
</ul>
<li><p>Help Moayad with certbot-auto for Let&rsquo;s Encrypt scripts on the new AReS server (linode20)</p></li>
<li><p>Normalize all <code>text_lang</code> values for metadata on CGSpace and DSpace Test (as I had tested last month):</p>
<pre><code>UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', ''); <pre><code>UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL; UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa'); UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
</code></pre></li> </code></pre><ul>
<li>Send Francesca Giampieri from Bioversity a CSV export of all their items issued in 2018
<li><p>Send Francesca Giampieri from Bioversity a CSV export of all their items issued in 2018</p>
<ul> <ul>
<li>They will be doing a migration of 1500 items from their TYPO3 database into CGSpace soon and want an example CSV with all required metadata columns</li> <li>They will be doing a migration of 1500 items from their TYPO3 database into CGSpace soon and want an example CSV with all required metadata columns</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-10">2019-05-10</h2> </ul>
<h2 id="20190510">2019-05-10</h2>
<ul> <ul>
<li>I finally had time to analyze the 7,000 IPs from the major traffic spike on 2019-05-06 after several runs of my <code>resolve-addresses.py</code> script (ipapi.co has a limit of 1,000 requests per day)</li> <li>I finally had time to analyze the 7,000 IPs from the major traffic spike on 2019-05-06 after several runs of my <code>resolve-addresses.py</code> script (ipapi.co has a limit of 1,000 requests per day)</li>
<li>Resolving the unique IP addresses to organization and AS names reveals some pretty big abusers: <li>Resolving the unique IP addresses to organization and AS names reveals some pretty big abusers:
<ul> <ul>
<li>1213 from Region40 LLC (AS200557)</li> <li>1213 from Region40 LLC (AS200557)</li>
<li>697 from Trusov Ilya Igorevych (AS50896)</li> <li>697 from Trusov Ilya Igorevych (AS50896)</li>
@ -500,142 +442,109 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
<li>196 from Cogent Communications (AS174)</li> <li>196 from Cogent Communications (AS174)</li>
<li>125 from Blockchain Network Solutions Ltd (AS43444)</li> <li>125 from Blockchain Network Solutions Ltd (AS43444)</li>
<li>118 from Silverstar Invest Limited (AS35624)</li> <li>118 from Silverstar Invest Limited (AS35624)</li>
</ul></li>
<li><p>All of the IPs from these networks are using generic user agents like this, but MANY more, and they change many times:</p>
<pre><code>&quot;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2703.0 Safari/537.36&quot;
</code></pre></li>
<li><p>I found a <a href="https://www.qurium.org/alerts/azerbaijan/azerbaijan-and-the-region40-ddos-service/">blog post from 2018 detailing an attack from a DDoS service</a> that matches our pattern exactly</p></li>
<li><p>They specifically mention:</p></li>
</ul> </ul>
</li>
<pre>The attack that targeted the "Search" functionality of the website, aimed to bypass our mitigation by performing slow but simultaneous searches from 5500 IP addresses.</pre> <li>All of the IPs from these networks are using generic user agents like this, but MANY more, and they change many times:</li>
</ul>
<pre><code>&quot;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2703.0 Safari/537.36&quot;
</code></pre><ul>
<li>I found a <a href="https://www.qurium.org/alerts/azerbaijan/azerbaijan-and-the-region40-ddos-service/">blog post from 2018 detailing an attack from a DDoS service</a> that matches our pattern exactly</li>
<li>They specifically mention:</li>
</ul>
<!-- raw HTML omitted -->
<ul> <ul>
<li>So this was definitely an attack of some sort&hellip; only God knows why</li> <li>So this was definitely an attack of some sort&hellip; only God knows why</li>
<li>I noticed a few new bots that don&rsquo;t use the word &ldquo;bot&rdquo; in their user agent and therefore don&rsquo;t match Tomcat&rsquo;s Crawler Session Manager Valve: <li>I noticed a few new bots that don't use the word &ldquo;bot&rdquo; in their user agent and therefore don't match Tomcat's Crawler Session Manager Valve:
<ul> <ul>
<li><code>Blackboard Safeassign</code></li> <li><code>Blackboard Safeassign</code></li>
<li><code>Unpaywall</code></li> <li><code>Unpaywall</code></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-12">2019-05-12</h2> </ul>
<h2 id="20190512">2019-05-12</h2>
<ul> <ul>
<li><p>I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):</p> <li>I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):</li>
</ul>
<pre><code>$ cat dspace.log.2019-05-11 | grep -E 'ip_addr=(100.26.206.188|100.27.19.233|107.22.98.199|174.129.156.41|18.205.243.110|18.205.245.200|18.207.176.164|18.207.209.186|18.212.126.89|18.212.5.59|18.213.4.150|18.232.120.6|18.234.180.224|18.234.81.13|3.208.23.222|34.201.121.183|34.201.241.214|34.201.39.122|34.203.188.39|34.207.197.154|34.207.232.63|34.207.91.147|34.224.86.47|34.227.205.181|34.228.220.218|34.229.223.120|35.171.160.166|35.175.175.202|3.80.201.39|3.81.120.70|3.81.43.53|3.84.152.19|3.85.113.253|3.85.237.139|3.85.56.100|3.87.23.95|3.87.248.240|3.87.250.3|3.87.62.129|3.88.13.9|3.88.57.237|3.89.71.15|3.90.17.242|3.90.68.247|3.91.44.91|3.92.138.47|3.94.250.180|52.200.78.128|52.201.223.200|52.90.114.186|52.90.48.73|54.145.91.243|54.160.246.228|54.165.66.180|54.166.219.216|54.166.238.172|54.167.89.152|54.174.94.223|54.196.18.211|54.198.234.175|54.208.8.172|54.224.146.147|54.234.169.91|54.235.29.216|54.237.196.147|54.242.68.231|54.82.6.96|54.87.12.181|54.89.217.141|54.89.234.182|54.90.81.216|54.91.104.162)' | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l <pre><code>$ cat dspace.log.2019-05-11 | grep -E 'ip_addr=(100.26.206.188|100.27.19.233|107.22.98.199|174.129.156.41|18.205.243.110|18.205.245.200|18.207.176.164|18.207.209.186|18.212.126.89|18.212.5.59|18.213.4.150|18.232.120.6|18.234.180.224|18.234.81.13|3.208.23.222|34.201.121.183|34.201.241.214|34.201.39.122|34.203.188.39|34.207.197.154|34.207.232.63|34.207.91.147|34.224.86.47|34.227.205.181|34.228.220.218|34.229.223.120|35.171.160.166|35.175.175.202|3.80.201.39|3.81.120.70|3.81.43.53|3.84.152.19|3.85.113.253|3.85.237.139|3.85.56.100|3.87.23.95|3.87.248.240|3.87.250.3|3.87.62.129|3.88.13.9|3.88.57.237|3.89.71.15|3.90.17.242|3.90.68.247|3.91.44.91|3.92.138.47|3.94.250.180|52.200.78.128|52.201.223.200|52.90.114.186|52.90.48.73|54.145.91.243|54.160.246.228|54.165.66.180|54.166.219.216|54.166.238.172|54.167.89.152|54.174.94.223|54.196.18.211|54.198.234.175|54.208.8.172|54.224.146.147|54.234.169.91|54.235.29.216|54.237.196.147|54.242.68.231|54.82.6.96|54.87.12.181|54.89.217.141|54.89.234.182|54.90.81.216|54.91.104.162)' | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2206 2206
</code></pre></li> </code></pre><ul>
<li>I added &ldquo;Unpaywall&rdquo; to the list of bots in the Tomcat Crawler Session Manager Valve</li>
<li><p>I added &ldquo;Unpaywall&rdquo; to the list of bots in the Tomcat Crawler Session Manager Valve</p></li> <li>Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)</li>
<li>Run all system updates on linode20 and reboot it</li>
<li><p>Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)</p></li> <li>Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host</li>
<li>Commit changes to the <code>resolve-addresses.py</code> script to add proper CSV output support</li>
<li><p>Run all system updates on linode20 and reboot it</p></li>
<li><p>Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host</p></li>
<li><p>Commit changes to the <code>resolve-addresses.py</code> script to add proper CSV output support</p></li>
</ul> </ul>
<h2 id="20190514">2019-05-14</h2>
<h2 id="2019-05-14">2019-05-14</h2>
<ul> <ul>
<li>Skype with Peter and AgroKnow about CTA story telling modification they want to do on the CTA ICT Update collection on CGSpace <li>Skype with Peter and AgroKnow about CTA story telling modification they want to do on the CTA ICT Update collection on CGSpace
<ul> <ul>
<li>I told them they should aim for modifying the collection theme and insert some custom HTML / JS</li> <li>I told them they should aim for modifying the collection theme and insert some custom HTML / JS</li>
<li>I need to send Panagis some documentation about Mirage 2 and the DSpace build process, as well as the Maven settings for build</li> <li>I need to send Panagis some documentation about Mirage 2 and the DSpace build process, as well as the Maven settings for build</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-15">2019-05-15</h2> </ul>
<h2 id="20190515">2019-05-15</h2>
<ul> <ul>
<li>Tezira says she&rsquo;s having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with <code>dspace test-email</code> and it&rsquo;s also working&hellip;</li> <li>Tezira says she's having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with <code>dspace test-email</code> and it's also working&hellip;</li>
<li>Send a list of DSpace build tips to Panagis from AgroKnow</li> <li>Send a list of DSpace build tips to Panagis from AgroKnow</li>
<li>Finally fix the AReS v2 to work via DSpace Test and send it to Peter et al to give their feedback <li>Finally fix the AReS v2 to work via DSpace Test and send it to Peter et al to give their feedback
<ul> <ul>
<li>We had issues with CORS due to Moayad using a hard-coded domain name rather than a relative URL</li> <li>We had issues with CORS due to Moayad using a hard-coded domain name rather than a relative URL</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-16">2019-05-16</h2> </ul>
<h2 id="20190516">2019-05-16</h2>
<ul> <ul>
<li><p>Export a list of all investors (<code>dc.description.sponsorship</code>) for Peter to look through and correct:</p> <li>Export a list of all investors (<code>dc.description.sponsorship</code>) for Peter to look through and correct:</li>
</ul>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-05-16-investors.csv WITH CSV HEADER; <pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-05-16-investors.csv WITH CSV HEADER;
COPY 995 COPY 995
</code></pre></li> </code></pre><ul>
<li>Fork the <a href="https://github.com/icarda-git/AReS">ICARDA AReS v1 repository</a> to <a href="https://github.com/ilri/AReS">ILRI's GitHub</a> and give access to CodeObia guys
<li><p>Fork the <a href="https://github.com/icarda-git/AReS">ICARDA AReS v1 repository</a> to <a href="https://github.com/ilri/AReS">ILRI&rsquo;s GitHub</a> and give access to CodeObia guys</p>
<ul> <ul>
<li>The plan is that we develop the v2 code here</li> <li>The plan is that we develop the v2 code here</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-17">2019-05-17</h2> </ul>
<h2 id="20190517">2019-05-17</h2>
<ul> <ul>
<li>Peter sent me a bunch of fixes for investors from yesterday</li> <li>Peter sent me a bunch of fixes for investors from yesterday</li>
<li>I did a quick check in Open Refine (trim and collapse whitespace, clean smart quotes, etc) and then applied them on CGSpace:</li>
<li><p>I did a quick check in Open Refine (trim and collapse whitespace, clean smart quotes, etc) and then applied them on CGSpace:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-16-fix-306-Investors.csv -db dspace-u dspace-p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-16-fix-306-Investors.csv -db dspace-u dspace-p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
$ ./delete-metadata-values.py -i /tmp/2019-05-16-delete-297-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d $ ./delete-metadata-values.py -i /tmp/2019-05-16-delete-297-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
</code></pre></li> </code></pre><ul>
<li>Then I started a full Discovery re-indexing:</li>
<li><p>Then I started a full Discovery re-indexing:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b $ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
</code></pre></li> </code></pre><ul>
<li>I was going to make a new controlled vocabulary of the top 100 terms after these corrections, but I noticed a bunch of duplicates and variations when I sorted them alphabetically</li>
<li><p>I was going to make a new controlled vocabulary of the top 100 terms after these corrections, but I noticed a bunch of duplicates and variations when I sorted them alphabetically</p></li> <li>Instead, I exported a new list and asked Peter to look at it again</li>
<li>Apply Peter's new corrections on DSpace Test and CGSpace:</li>
<li><p>Instead, I exported a new list and asked Peter to look at it again</p></li> </ul>
<li><p>Apply Peter&rsquo;s new corrections on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-17-fix-25-Investors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-17-fix-25-Investors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
$ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d $ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
</code></pre></li> </code></pre><ul>
<li>Then I re-exported the sponsors and took the top 100 to update the existing controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/423">#423</a>)
<li><p>Then I re-exported the sponsors and took the top 100 to update the existing controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/423">#423</a>)</p>
<ul> <ul>
<li>I will deploy the changes on CGSpace the next time we re-deploy</li> <li>I will deploy the changes on CGSpace the next time we re-deploy</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-19">2019-05-19</h2> </ul>
<h2 id="20190519">2019-05-19</h2>
<ul> <ul>
<li>Add &ldquo;ISI journal&rdquo; to item view sidebar at the request of Maria Garruccio</li> <li>Add &ldquo;ISI journal&rdquo; to item view sidebar at the request of Maria Garruccio</li>
<li>Update <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts to add some basic checking of CSV fields and colorize shell output using Colorama</li> <li>Update <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts to add some basic checking of CSV fields and colorize shell output using Colorama</li>
</ul> </ul>
<h2 id="20190524">2019-05-24</h2>
<h2 id="2019-05-24">2019-05-24</h2>
<ul> <ul>
<li>Update AReS README.md on GitHub repository to add a proper introduction, credits, requirements, installation instructions, and legal information</li> <li>Update AReS README.md on GitHub repository to add a proper introduction, credits, requirements, installation instructions, and legal information</li>
<li>Update CIP subjects in input forms on CGSpace (<a href="https://github.com/ilri/DSpace/pull/424">#424</a>)</li> <li>Update CIP subjects in input forms on CGSpace (<a href="https://github.com/ilri/DSpace/pull/424">#424</a>)</li>
</ul> </ul>
<h2 id="20190525">2019-05-25</h2>
<h2 id="2019-05-25">2019-05-25</h2>
<ul> <ul>
<li>Help Abenet proof ten Africa Rice publications <li>Help Abenet proof ten Africa Rice publications
<ul> <ul>
<li>Convert some dates to string (from number in Excel)</li> <li>Convert some dates to string (from number in Excel)</li>
<li>Trim whitespace on all fields</li> <li>Trim whitespace on all fields</li>
@ -643,73 +552,57 @@ $ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dsp
<li>Validate subject terms against AGROVOC</li> <li>Validate subject terms against AGROVOC</li>
<li>Add rights information to all items</li> <li>Add rights information to all items</li>
<li>Correct and standardize sponsors</li> <li>Correct and standardize sponsors</li>
</ul></li>
<li><p>Generate Simple Archive Format bundle with SAFBuilder and import into the <a href="https://cgspace.cgiar.org/handle/10568/101106">AfricaRice Articles in Journals</a> collection on CGSpace:</p>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-05-25-AfricaRice.map -s /tmp/SimpleArchiveFormat
</code></pre></li>
</ul> </ul>
</li>
<h2 id="2019-05-27">2019-05-27</h2> <li>Generate Simple Archive Format bundle with SAFBuilder and import into the <a href="https://cgspace.cgiar.org/handle/10568/101106">AfricaRice Articles in Journals</a> collection on CGSpace:</li>
</ul>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-05-25-AfricaRice.map -s /tmp/SimpleArchiveFormat
</code></pre><h2 id="20190527">2019-05-27</h2>
<ul> <ul>
<li><p>Peter sent me over two thousand corrections for the authors on CGSpace that I had dumped last month</p> <li>Peter sent me over two thousand corrections for the authors on CGSpace that I had dumped last month
<ul> <ul>
<li><p>I proofed them for whitespace and invalid special characters in OpenRefine and then applied them on CGSpace and DSpace Test:</p> <li>I proofed them for whitespace and invalid special characters in OpenRefine and then applied them on CGSpace and DSpace Test:</li>
</ul>
</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-27-fix-2472-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t corrections -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-27-fix-2472-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t corrections -d
</code></pre></li> </code></pre><ul>
</ul></li> <li>Then start a full Discovery re-indexing on each server:</li>
</ul>
<li><p>Then start a full Discovery re-indexing on each server:</p>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b $ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
</code></pre></li> </code></pre><ul>
<li>Export new list of all authors from CGSpace database to send to Peter:</li>
<li><p>Export new list of all authors from CGSpace database to send to Peter:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-05-27-all-authors.csv with csv header; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-05-27-all-authors.csv with csv header;
COPY 64871 COPY 64871
</code></pre></li> </code></pre><ul>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li><p>Run all system updates on DSpace Test (linode19) and reboot it</p></li> <li>Paola from CIAT asked for a way to generate a report of the top keywords for each year of their articles and journals
<li><p>Paola from CIAT asked for a way to generate a report of the top keywords for each year of their articles and journals</p>
<ul> <ul>
<li>I told them that the best way (even though it&rsquo;s low tech) is to work on a CSV dump of the collection</li> <li>I told them that the best way (even though it's low tech) is to work on a CSV dump of the collection</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-29">2019-05-29</h2> </ul>
<h2 id="20190529">2019-05-29</h2>
<ul> <ul>
<li>A CIMMYT user was having problems registering or logging into CGSpace <li>A CIMMYT user was having problems registering or logging into CGSpace
<ul> <ul>
<li>I tried to register her and it gave an error, then I remembered for CGIAR LDAP users we actually need to just log in and it will automatically create an eperson</li> <li>I tried to register her and it gave an error, then I remembered for CGIAR LDAP users we actually need to just log in and it will automatically create an eperson</li>
<li>I told her to try to log in with the LDAP login method and let me know what happens (then I can look in the logs too)</li> <li>I told her to try to log in with the LDAP login method and let me know what happens (then I can look in the logs too)</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-05-30">2019-05-30</h2> </ul>
<h2 id="20190530">2019-05-30</h2>
<ul> <ul>
<li><p>I see the following error in the DSpace log when the user tries to log in with her CGIAR email and password on the LDAP login:</p> <li>I see the following error in the DSpace log when the user tries to log in with her CGIAR email and password on the LDAP login:</li>
<pre><code>2019-05-30 07:19:35,166 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A5E0C836AF8F3ABB769FE47107AE1CFF:ip_addr=185.71.4.34:failed_login:no DN found for user sa.saini@cgiar.org
</code></pre></li>
<li><p>For now I just created an eperson with her personal email address until I have time to check LDAP to see what&rsquo;s up with her CGIAR account:</p>
<pre><code>$ dspace user -a -m blah@blah.com -g Sakshi -s Saini -p 'sknflksnfksnfdls'
</code></pre></li>
</ul> </ul>
<pre><code>2019-05-30 07:19:35,166 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A5E0C836AF8F3ABB769FE47107AE1CFF:ip_addr=185.71.4.34:failed_login:no DN found for user sa.saini@cgiar.org
<!-- vim: set sw=2 ts=2: --> </code></pre><ul>
<li>For now I just created an eperson with her personal email address until I have time to check LDAP to see what's up with her CGIAR account:</li>
</ul>
<pre><code>$ dspace user -a -m blah@blah.com -g Sakshi -s Saini -p 'sknflksnfksnfdls'
</code></pre><!-- raw HTML omitted -->

View File

@ -8,14 +8,11 @@
<meta property="og:title" content="June, 2019" /> <meta property="og:title" content="June, 2019" />
<meta property="og:description" content="2019-06-02 <meta property="og:description" content="2019-06-02
Merge the Solr filterCache and XMLUI ISI journal changes to the 5_x-prod branch and deploy on CGSpace Merge the Solr filterCache and XMLUI ISI journal changes to the 5_x-prod branch and deploy on CGSpace
Run system updates on CGSpace (linode18) and reboot it Run system updates on CGSpace (linode18) and reboot it
2019-06-03 2019-06-03
Skype with Marie-Angélique and Abenet about CG Core v2 Skype with Marie-Angélique and Abenet about CG Core v2
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -27,17 +24,14 @@ Skype with Marie-Angélique and Abenet about CG Core v2
<meta name="twitter:title" content="June, 2019"/> <meta name="twitter:title" content="June, 2019"/>
<meta name="twitter:description" content="2019-06-02 <meta name="twitter:description" content="2019-06-02
Merge the Solr filterCache and XMLUI ISI journal changes to the 5_x-prod branch and deploy on CGSpace Merge the Solr filterCache and XMLUI ISI journal changes to the 5_x-prod branch and deploy on CGSpace
Run system updates on CGSpace (linode18) and reboot it Run system updates on CGSpace (linode18) and reboot it
2019-06-03 2019-06-03
Skype with Marie-Angélique and Abenet about CG Core v2 Skype with Marie-Angélique and Abenet about CG Core v2
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,22 +112,17 @@ Skype with Marie-Angélique and Abenet about CG Core v2
</p> </p>
</header> </header>
<h2 id="2019-06-02">2019-06-02</h2> <h2 id="20190602">2019-06-02</h2>
<ul> <ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li> <li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li> <li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul> </ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul> <ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li> <li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul> </ul>
<ul> <ul>
<li>Here is a list of proposed metadata migrations for CGSpace <li>Here is a list of proposed metadata migrations for CGSpace
<ul> <ul>
<li>dc.language.iso→DCTERMS.language (and switch to ISO 639-2 Alpha 3)</li> <li>dc.language.iso→DCTERMS.language (and switch to ISO 639-2 Alpha 3)</li>
<li>dc.description.abstract→DCTERMS.abstract</li> <li>dc.description.abstract→DCTERMS.abstract</li>
@ -145,184 +134,161 @@ Skype with Marie-Angélique and Abenet about CG Core v2
<li>cg.creator.id→cg.creator.identifier?</li> <li>cg.creator.id→cg.creator.identifier?</li>
<li>dc.relation.ispartofseries→DCTERMS.isPartOf</li> <li>dc.relation.ispartofseries→DCTERMS.isPartOf</li>
<li>cg.link.relation→DCTERMS.relation</li> <li>cg.link.relation→DCTERMS.relation</li>
</ul></li> </ul>
</li>
<li>Marie agreed that we need to adopt some controlled lists for our values, and pointed out that the MARLO team maintains a list of CRPs and Centers at <a href="https://clarisa.cgiar.org/">CLARISA</a> <li>Marie agreed that we need to adopt some controlled lists for our values, and pointed out that the MARLO team maintains a list of CRPs and Centers at <a href="https://clarisa.cgiar.org/">CLARISA</a>
<ul> <ul>
<li>There is an API there but it needs a password for access&hellip;</li> <li>There is an API there but it needs a password for access&hellip;</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-06-04">2019-06-04</h2> </ul>
<h2 id="20190604">2019-06-04</h2>
<ul> <ul>
<li>The MARLO team responded and said they will give us access to the CLARISA API</li> <li>The MARLO team responded and said they will give us access to the CLARISA API</li>
<li>Marie-Angélique <a href="https://github.com/AgriculturalSemantics/cg-core/pull/1">proposed</a> to integrate <code>dcterms.isPartOf</code>, <code>dcterms.abstract</code>, and <code>dcterms.bibliographicCitation</code> into the CG Core v2 schema <li>Marie-Angélique <a href="https://github.com/AgriculturalSemantics/cg-core/pull/1">proposed</a> to integrate <code>dcterms.isPartOf</code>, <code>dcterms.abstract</code>, and <code>dcterms.bibliographicCitation</code> into the CG Core v2 schema
<ul> <ul>
<li>I told her I would attempt to integrate those and the others above into DSpace Test soon and report back</li> <li>I told her I would attempt to integrate those and the others above into DSpace Test soon and report back</li>
<li>We also need to discuss with the ILRI Data Portal, MEL/MELSpace, and users who consume the CGSpace API</li> <li>We also need to discuss with the ILRI Data Portal, MEL/MELSpace, and users who consume the CGSpace API</li>
</ul></li> </ul>
</li>
<li>Add Arabic language to input-forms.xml (<a href="https://github.com/ilri/DSpace/pull/427">#427</a>), as Bioversity is adding some Arabic items and noticed it missing</li> <li>Add Arabic language to input-forms.xml (<a href="https://github.com/ilri/DSpace/pull/427">#427</a>), as Bioversity is adding some Arabic items and noticed it missing</li>
</ul> </ul>
<h2 id="20190605">2019-06-05</h2>
<h2 id="2019-06-05">2019-06-05</h2>
<ul> <ul>
<li>Send mail to CGSpace and MELSpace people to let them know about the proposed metadata field migrations after the discussion with Marie-Angélique</li> <li>Send mail to CGSpace and MELSpace people to let them know about the proposed metadata field migrations after the discussion with Marie-Angélique</li>
</ul> </ul>
<h2 id="20190607">2019-06-07</h2>
<h2 id="2019-06-07">2019-06-07</h2>
<ul> <ul>
<li><p>Thierry noticed that the CUA statistics were missing previous years again, and I see that the Solr admin UI has the following message:</p> <li>Thierry noticed that the CUA statistics were missing previous years again, and I see that the Solr admin UI has the following message:</li>
<pre><code>statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
</code></pre></li>
<li><p>I had to restart Tomcat a few times for all the stats cores to get loaded with no issue</p></li>
</ul> </ul>
<pre><code>statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
<h2 id="2019-06-10">2019-06-10</h2> </code></pre><ul>
<li>I had to restart Tomcat a few times for all the stats cores to get loaded with no issue</li>
</ul>
<h2 id="20190610">2019-06-10</h2>
<ul> <ul>
<li>Rename the AReS repository on GitHub to OpenRXV: <a href="https://github.com/ilri/OpenRXV">https://github.com/ilri/OpenRXV</a></li> <li>Rename the AReS repository on GitHub to OpenRXV: <a href="https://github.com/ilri/OpenRXV">https://github.com/ilri/OpenRXV</a></li>
<li>Create a new AReS repository: <a href="https://github.com/ilri/AReS">https://github.com/ilri/AReS</a></li> <li>Create a new AReS repository: <a href="https://github.com/ilri/AReS">https://github.com/ilri/AReS</a></li>
<li>Start looking at the 203 IITA records on DSpace Test from last month (<a href="https://dspacetest.cgiar.org/handle/10568/102032">IITA_May_16</a> aka &ldquo;20194th.xls&rdquo;) using OpenRefine <li>Start looking at the 203 IITA records on DSpace Test from last month (<a href="https://dspacetest.cgiar.org/handle/10568/102032">IITA_May_16</a> aka &ldquo;20194th.xls&rdquo;) using OpenRefine
<ul> <ul>
<li>Trim leading, trailing, and consecutive whitespace on all columns, but I didn&rsquo;t notice very many issues</li> <li>Trim leading, trailing, and consecutive whitespace on all columns, but I didn't notice very many issues</li>
<li>Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven</li> <li>Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven</li>
<li>Validate countries against latest list of countries using reconcile-csv, correcting three</li> <li>Validate countries against latest list of countries using reconcile-csv, correcting three</li>
<li>Convert all DOIs to &ldquo;<a href="https://dx.doi.org&quot;">https://dx.doi.org&quot;</a> format</li> <li>Convert all DOIs to &ldquo;<a href="https://dx.doi.org">https://dx.doi.org</a>&rdquo; format</li>
<li>Normalize all <code>cg.identifier.url</code> Google book fields to &ldquo;books.google.com&rdquo;</li> <li>Normalize all <code>cg.identifier.url</code> Google book fields to &ldquo;books.google.com&rdquo;</li>
<li>Correct some inconsistencies in IITA subjects</li> <li>Correct some inconsistencies in IITA subjects</li>
<li>Correct two incorrect &ldquo;Peer Review&rdquo; in <code>dc.description.version</code></li> <li>Correct two incorrect &ldquo;Peer Review&rdquo; in <code>dc.description.version</code></li>
<li>About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers)</li> <li>About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers)</li>
<li>Delete one blank item</li> <li>Delete one blank item</li>
<li>I managed to get to subjects, so I&rsquo;ll continue from there when I start working next</li> <li>I managed to get to subjects, so I'll continue from there when I start working next</li>
</ul></li> </ul>
</li>
<li><p>Generate a new list of countries from the database for use with reconcile-csv</p> <li>Generate a new list of countries from the database for use with reconcile-csv
<ul> <ul>
<li><p>After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example <code>id</code> and <code>name</code>:</p> <li>After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example <code>id</code> and <code>name</code>:</li>
</ul>
</li>
</ul>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 228 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC) to /tmp/countries.csv WITH CSV HEADER <pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 228 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC) to /tmp/countries.csv WITH CSV HEADER
COPY 192 COPY 192
$ csvcut -l -c 0 /tmp/countries.csv &gt; 2019-06-10-countries.csv $ csvcut -l -c 0 /tmp/countries.csv &gt; 2019-06-10-countries.csv
</code></pre></li> </code></pre><ul>
</ul></li> <li>Get a list of all the unique AGROVOC subject terms in IITA's data and export it to a text file so I can validate them with my <code>agrovoc-lookup.py</code> script:</li>
</ul>
<li><p>Get a list of all the unique AGROVOC subject terms in IITA&rsquo;s data and export it to a text file so I can validate them with my <code>agrovoc-lookup.py</code> script:</p>
<pre><code>$ csvcut -c dc.subject ~/Downloads/2019-06-10-IITA-20194th-Round-2.csv| sed 's/||/\n/g' | grep -v dc.subject | sort -u &gt; iita-agrovoc.txt <pre><code>$ csvcut -c dc.subject ~/Downloads/2019-06-10-IITA-20194th-Round-2.csv| sed 's/||/\n/g' | grep -v dc.subject | sort -u &gt; iita-agrovoc.txt
$ ./agrovoc-lookup.py -i iita-agrovoc.txt -om iita-agrovoc-matches.txt -or iita-agrovoc-rejects.txt $ ./agrovoc-lookup.py -i iita-agrovoc.txt -om iita-agrovoc-matches.txt -or iita-agrovoc-rejects.txt
$ wc -l iita-agrovoc* $ wc -l iita-agrovoc*
402 iita-agrovoc-matches.txt 402 iita-agrovoc-matches.txt
29 iita-agrovoc-rejects.txt 29 iita-agrovoc-rejects.txt
431 iita-agrovoc.txt 431 iita-agrovoc.txt
</code></pre></li> </code></pre><ul>
<li>Combine these IITA matches with the subjects I matched a few months ago:</li>
<li><p>Combine these IITA matches with the subjects I matched a few months ago:</p>
<pre><code>$ csvcut -c name 2019-03-18-subjects-matched.csv | grep -v name | cat - iita-agrovoc-matches.txt | sort -u &gt; 2019-06-10-subjects-matched.txt
</code></pre></li>
<li><p>Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to <code>id</code>:</p>
<pre><code>$ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' &gt; 2019-06-10-subjects-matched.csv
</code></pre></li>
</ul> </ul>
<pre><code>$ csvcut -c name 2019-03-18-subjects-matched.csv | grep -v name | cat - iita-agrovoc-matches.txt | sort -u &gt; 2019-06-10-subjects-matched.txt
<h2 id="2019-06-20">2019-06-20</h2> </code></pre><ul>
<li>Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to <code>id</code>:</li>
</ul>
<pre><code>$ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' &gt; 2019-06-10-subjects-matched.csv
</code></pre><h2 id="20190620">2019-06-20</h2>
<ul> <ul>
<li>Share some feedback about AReS v2 with the colleagues and encourage them to do the same</li> <li>Share some feedback about AReS v2 with the colleagues and encourage them to do the same</li>
</ul> </ul>
<h2 id="20190623">2019-06-23</h2>
<h2 id="2019-06-23">2019-06-23</h2>
<ul> <ul>
<li>Continue work on reviewing CG Core v2 standard and its implications to CGSpace an DSpace platforms in general <li>Continue work on reviewing CG Core v2 standard and its implications to CGSpace an DSpace platforms in general
<ul> <ul>
<li>Update my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">list of fields to migrate</a></li> <li>Update my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">list of fields to migrate</a></li>
<li>Submit an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">issue with my feedback to the CG Core project</a></li> <li>Submit an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">issue with my feedback to the CG Core project</a></li>
</ul></li> </ul>
</li>
<li><p>Update my local PostgreSQL container:</p> <li>Update my local PostgreSQL container:</li>
</ul>
<pre><code>$ podman pull docker.io/library/postgres:9.6-alpine <pre><code>$ podman pull docker.io/library/postgres:9.6-alpine
$ podman rm dspacedb $ podman rm dspacedb
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine $ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
</code></pre></li> </code></pre><h2 id="20190625">2019-06-25</h2>
</ul>
<h2 id="2019-06-25">2019-06-25</h2>
<ul> <ul>
<li><p>Normalize <code>text_lang</code> values for metadata on DSpace Test and CGSpace:</p> <li>Normalize <code>text_lang</code> values for metadata on DSpace Test and CGSpace:</li>
</ul>
<pre><code>dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', ''); <pre><code>dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
UPDATE 1551 UPDATE 1551
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL; dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
UPDATE 2070 UPDATE 2070
dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa'); dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
UPDATE 2 UPDATE 2
</code></pre></li> </code></pre><ul>
<li>Upload 202 IITA records from earlier this month (20194th.xls) to CGSpace</li>
<li><p>Upload 202 IITA records from earlier this month (20194th.xls) to CGSpace</p></li> <li>Communicate with Bioversity contractor in charge of their migration from Typo3 to CGSpace</li>
<li><p>Communicate with Bioversity contractor in charge of their migration from Typo3 to CGSpace</p></li>
</ul> </ul>
<h2 id="20190628">2019-06-28</h2>
<h2 id="2019-06-28">2019-06-28</h2>
<ul> <ul>
<li>Start looking at the fifty-seven AfricaRice records sent by Ibnou earlier this month <li>Start looking at the fifty-seven AfricaRice records sent by Ibnou earlier this month
<ul> <ul>
<li>First, I see there are several items with type &ldquo;Book&rdquo; and &ldquo;Book Chapter&rdquo; should go in an &ldquo;AfricaRice books and book chapters&rdquo; collection, but none exists in the AfricaRice community</li> <li>First, I see there are several items with type &ldquo;Book&rdquo; and &ldquo;Book Chapter&rdquo; should go in an &ldquo;AfricaRice books and book chapters&rdquo; collection, but none exists in the AfricaRice community</li>
<li>Trim and collapse consecutive whitespace on author, affiliation, authorship types, title, subjects, doi, issn, source, citation, country, sponsors</li> <li>Trim and collapse consecutive whitespace on author, affiliation, authorship types, title, subjects, doi, issn, source, citation, country, sponsors</li>
<li>Standardize and correct affiliations like &ldquo;Africa Rice Cente&rdquo; and &ldquo;Africa Rice Centre&rdquo;, including syntax errors with multi-value separators</li> <li>Standardize and correct affiliations like &ldquo;Africa Rice Cente&rdquo; and &ldquo;Africa Rice Centre&rdquo;, including syntax errors with multi-value separators</li>
<li>Lots of variation in affiliations, for example:</li> <li>Lots of variation in affiliations, for example:
<ul>
<li>Université Abomey-Calavi</li> <li>Université Abomey-Calavi</li>
<li>Université d&rsquo;Abomey</li> <li>Université d'Abomey</li>
<li>Université d&rsquo;Abomey Calavi</li> <li>Université d'Abomey Calavi</li>
<li>Université d&rsquo;Abomey-Calavi</li> <li>Université d'Abomey-Calavi</li>
<li>University of Abomey-Calavi</li> <li>University of Abomey-Calavi</li>
<li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:</li> </ul>
</li>
<li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:
<ul>
<li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li> <li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li>
<li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colume and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li> <li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colume and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li>
</ul>
</li>
<li>Replace smart quotes with standard ASCII ones</li> <li>Replace smart quotes with standard ASCII ones</li>
<li>Fix typos in authoriship types</li> <li>Fix typos in authoriship types</li>
<li>Validate and normalize subjects against our 2019-06 list using reconcile-csv and OpenRefine:</li> <li>Validate and normalize subjects against our 2019-06 list using reconcile-csv and OpenRefine:
<ul>
<li><code>$ lein run ~/src/git/DSpace/2019-06-10-subjects-matched.csv name id</code></li> <li><code>$ lein run ~/src/git/DSpace/2019-06-10-subjects-matched.csv name id</code></li>
<li>Also add about 30 new AGROVOC subjects to our list that I verified manually</li> <li>Also add about 30 new AGROVOC subjects to our list that I verified manually</li>
</ul>
</li>
<li>There is one duplicate, both have the same DOI: <a href="https://doi.org/10.1016/j.agwat.2018.06.018">https://doi.org/10.1016/j.agwat.2018.06.018</a></li> <li>There is one duplicate, both have the same DOI: <a href="https://doi.org/10.1016/j.agwat.2018.06.018">https://doi.org/10.1016/j.agwat.2018.06.018</a></li>
<li>Fix four ISBNs that were in the ISSN field</li> <li>Fix four ISBNs that were in the ISSN field</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-06-30">2019-06-30</h2> </ul>
<h2 id="20190630">2019-06-30</h2>
<ul> <ul>
<li><p>Upload fifty-seven AfricaRice records to <a href="https://dspacetest.cgiar.org/handle/10568/102274">DSpace Test</a></p> <li>Upload fifty-seven AfricaRice records to <a href="https://dspacetest.cgiar.org/handle/10568/102274">DSpace Test</a>
<ul> <ul>
<li><p>I created the SAF bundler with SAFBuilder and then imported via the CLI:</p> <li>I created the SAF bundler with SAFBuilder and then imported via the CLI:</li>
</ul>
</li>
</ul>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-06-30-AfricaRice-11to73.map -s /tmp/2019-06-30-AfricaRice-11to73 <pre><code>$ dspace import -a -e me@cgiar.org -m 2019-06-30-AfricaRice-11to73.map -s /tmp/2019-06-30-AfricaRice-11to73
</code></pre></li> </code></pre><ul>
</ul></li> <li>I sent feedback about a few missing PDFs and one duplicate to Ibnou to check</li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li><p>I sent feedback about a few missing PDFs and one duplicate to Ibnou to check</p></li>
<li><p>Run all system updates on DSpace Test (linode19) and reboot it</p></li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,14 +8,13 @@
<meta property="og:title" content="July, 2019" /> <meta property="og:title" content="July, 2019" />
<meta property="og:description" content="2019-07-01 <meta property="og:description" content="2019-07-01
Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice
Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
DSpace Test DSpace Test
CGSpace CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
@ -27,17 +26,16 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
<meta name="twitter:title" content="July, 2019"/> <meta name="twitter:title" content="July, 2019"/>
<meta name="twitter:description" content="2019-07-01 <meta name="twitter:description" content="2019-07-01
Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice
Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
DSpace Test DSpace Test
CGSpace CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -118,48 +116,40 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
</p> </p>
</header> </header>
<h2 id="2019-07-01">2019-07-01</h2> <h2 id="20190701">2019-07-01</h2>
<ul> <ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li> <li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: <li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul></li> </ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li> <li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul> </ul>
<ul> <ul>
<li>If I change the parameters to 2019 I see stats, so I&rsquo;m really thinking it has something to do with the sharded yearly Solr statistics cores <li>If I change the parameters to 2019 I see stats, so I'm really thinking it has something to do with the sharded yearly Solr statistics cores
<ul> <ul>
<li>I checked the Solr admin UI and I see all Solr cores loaded, so I don&rsquo;t know what it could be</li> <li>I checked the Solr admin UI and I see all Solr cores loaded, so I don't know what it could be</li>
<li>When I check the Atmire content and usage module it seems obvious that there is a problem with the old cores because I dont have anything before 2019-01</li> <li>When I check the Atmire content and usage module it seems obvious that there is a problem with the old cores because I dont have anything before 2019-01</li>
</ul></li>
</ul> </ul>
</li>
<p><img src="/cgspace-notes/2019/07/atmire-cua-2018-missing.png" alt="Atmire CUA 2018 stats missing" /></p> </ul>
<p><img src="/cgspace-notes/2019/07/atmire-cua-2018-missing.png" alt="Atmire CUA 2018 stats missing"></p>
<ul> <ul>
<li>I don&rsquo;t see anyone logged in right now so I&rsquo;m going to try to restart Tomcat and see if the stats are accessible after Solr comes back up</li> <li>I don't see anyone logged in right now so I'm going to try to restart Tomcat and see if the stats are accessible after Solr comes back up</li>
<li>I decided to run all system updates on the server (linode18) and reboot it
<li><p>I decided to run all system updates on the server (linode18) and reboot it</p>
<ul> <ul>
<li>After rebooting Tomcat came back up, but the the Solr statistics cores were not all loaded</li> <li>After rebooting Tomcat came back up, but the the Solr statistics cores were not all loaded</li>
<li>The error is always (with a different core):</li>
<li><p>The error is always (with a different core):</p> </ul>
</li>
</ul>
<pre><code>org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock <pre><code>org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
</code></pre></li> </code></pre><ul>
</ul></li> <li>I restarted Tomcat <em>ten times</em> and it never worked&hellip;</li>
<li>I tried to stop Tomcat and delete the write locks:</li>
<li><p>I restarted Tomcat <em>ten times</em> and it never worked&hellip;</p></li> </ul>
<li><p>I tried to stop Tomcat and delete the write locks:</p>
<pre><code># systemctl stop tomcat7 <pre><code># systemctl stop tomcat7
# find /dspace/solr/statistics* -iname &quot;*.lock&quot; -print -delete # find /dspace/solr/statistics* -iname &quot;*.lock&quot; -print -delete
/dspace/solr/statistics/data/index/write.lock /dspace/solr/statistics/data/index/write.lock
@ -174,163 +164,131 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
/dspace/solr/statistics-2018/data/index/write.lock /dspace/solr/statistics-2018/data/index/write.lock
# find /dspace/solr/statistics* -iname &quot;*.lock&quot; -print -delete # find /dspace/solr/statistics* -iname &quot;*.lock&quot; -print -delete
# systemctl start tomcat7 # systemctl start tomcat7
</code></pre></li> </code></pre><ul>
<li>But it still didn't work!</li>
<li><p>But it still didn&rsquo;t work!</p></li> <li>I stopped Tomcat, deleted the old locks, and will try to use the &ldquo;simple&rdquo; lock file type in <code>solr/statistics/conf/solrconfig.xml</code>:</li>
<li><p>I stopped Tomcat, deleted the old locks, and will try to use the &ldquo;simple&rdquo; lock file type in <code>solr/statistics/conf/solrconfig.xml</code>:</p>
<pre><code>&lt;lockType&gt;${solr.lock.type:simple}&lt;/lockType&gt;
</code></pre></li>
<li><p>And after restarting Tomcat it still doesn&rsquo;t work</p></li>
<li><p>Now I&rsquo;ll try going back to &ldquo;native&rdquo; locking with <code>unlockAtStartup</code>:</p>
<pre><code>&lt;unlockOnStartup&gt;true&lt;/unlockOnStartup&gt;
</code></pre></li>
<li><p>Now the cores seem to load, but I still see an error in the Solr Admin UI and I still can&rsquo;t access any stats before 2018</p></li>
<li><p>I filed an <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">issue with Atmire</a>, so let&rsquo;s see if they can help</p></li>
<li><p>And since I&rsquo;m annoyed and it&rsquo;s been a few months, I&rsquo;m going to move the JVM heap settings that I&rsquo;ve been testing on DSpace Test to CGSpace</p></li>
<li><p>The old ones were:</p>
<pre><code>-Djava.awt.headless=true -Xms8192m -Xmx8192m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5400 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
</code></pre></li>
<li><p>And the new ones come from Solr 4.10.x&rsquo;s startup scripts:</p>
<pre><code>-Djava.awt.headless=true
-Xms8192m -Xmx8192m
-Dfile.encoding=UTF-8
-XX:NewRatio=3
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
-XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1337
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
</code></pre></li>
</ul> </ul>
<pre><code>&lt;lockType&gt;${solr.lock.type:simple}&lt;/lockType&gt;
<h2 id="2019-07-02">2019-07-02</h2> </code></pre><ul>
<li>And after restarting Tomcat it still doesn't work</li>
<li>Now I'll try going back to &ldquo;native&rdquo; locking with <code>unlockAtStartup</code>:</li>
</ul>
<pre><code>&lt;unlockOnStartup&gt;true&lt;/unlockOnStartup&gt;
</code></pre><ul>
<li>Now the cores seem to load, but I still see an error in the Solr Admin UI and I still can't access any stats before 2018</li>
<li>I filed an <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">issue with Atmire</a>, so let's see if they can help</li>
<li>And since I'm annoyed and it's been a few months, I'm going to move the JVM heap settings that I've been testing on DSpace Test to CGSpace</li>
<li>The old ones were:</li>
</ul>
<pre><code>-Djava.awt.headless=true -Xms8192m -Xmx8192m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5400 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
</code></pre><ul>
<li>And the new ones come from Solr 4.10.x's startup scripts:</li>
</ul>
<pre><code> -Djava.awt.headless=true
-Xms8192m -Xmx8192m
-Dfile.encoding=UTF-8
-XX:NewRatio=3
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
-XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1337
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
</code></pre><h2 id="20190702">2019-07-02</h2>
<ul> <ul>
<li><p>Help upload twenty-seven posters from the 2019-05 Sharefair to CGSpace</p> <li>Help upload twenty-seven posters from the 2019-05 Sharefair to CGSpace
<ul> <ul>
<li><p>Sisay had already done the SAFBundle so I did some minor corrections to and uploaded them to a temporary collection so I could check them in OpenRefine:</p> <li>Sisay had already done the SAFBundle so I did some minor corrections to and uploaded them to a temporary collection so I could check them in OpenRefine:</li>
</ul>
</li>
</ul>
<pre><code>$ sed -i 's/CC-BY 4.0/CC-BY-4.0/' item_*/dublin_core.xml <pre><code>$ sed -i 's/CC-BY 4.0/CC-BY-4.0/' item_*/dublin_core.xml
$ echo &quot;10568/101992&quot; &gt;&gt; item_*/collections $ echo &quot;10568/101992&quot; &gt;&gt; item_*/collections
$ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair_mapped $ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair_mapped
</code></pre></li> </code></pre><ul>
</ul></li> <li>I noticed that all twenty-seven items had double dates like &ldquo;2019-05||2019-05&rdquo; so I fixed those, but the rest of the metadata looked good so I unmapped them from the temporary collection</li>
<li>Finish looking at the fifty-six AfricaRice items and upload them to CGSpace:</li>
<li><p>I noticed that all twenty-seven items had double dates like &ldquo;2019-05||2019-05&rdquo; so I fixed those, but the rest of the metadata looked good so I unmapped them from the temporary collection</p></li> </ul>
<li><p>Finish looking at the fifty-six AfricaRice items and upload them to CGSpace:</p>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-07-02-AfricaRice-11to73.map -s /tmp/SimpleArchiveFormat <pre><code>$ dspace import -a -e me@cgiar.org -m 2019-07-02-AfricaRice-11to73.map -s /tmp/SimpleArchiveFormat
</code></pre></li> </code></pre><ul>
<li>Peter pointed out that the Sharefair dates I fixed were not actually fixed
<li><p>Peter pointed out that the Sharefair dates I fixed were not actually fixed</p>
<ul> <ul>
<li>It seems there is a bug that causes DSpace to not detect changes if the values are the same like &ldquo;2019-05||2019-05&rdquo; and you try to remove one</li> <li>It seems there is a bug that causes DSpace to not detect changes if the values are the same like &ldquo;2019-05||2019-05&rdquo; and you try to remove one</li>
<li>To get it to work I had to change some of them to 2019-01, then remove them</li> <li>To get it to work I had to change some of them to 2019-01, then remove them</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-07-03">2019-07-03</h2> </ul>
<h2 id="20190703">2019-07-03</h2>
<ul> <ul>
<li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue</a> and said they would be willing to help</li> <li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue</a> and said they would be willing to help</li>
</ul> </ul>
<h2 id="20190704">2019-07-04</h2>
<h2 id="2019-07-04">2019-07-04</h2>
<ul> <ul>
<li><p>Maria Garruccio sent me some new ORCID identifiers for Bioversity authors</p> <li>Maria Garruccio sent me some new ORCID identifiers for Bioversity authors
<ul> <ul>
<li><p>I combined them with our existing list and then used my <code>resolve-orcids.py</code> script to update the names from ORCID.org:</p> <li>I combined them with our existing list and then used my <code>resolve-orcids.py</code> script to update the names from ORCID.org:</li>
</ul>
</li>
</ul>
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/new-bioversity-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u &gt; /tmp/2019-07-04-orcid-ids.txt <pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/new-bioversity-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u &gt; /tmp/2019-07-04-orcid-ids.txt
$ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names.txt -d $ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names.txt -d
</code></pre></li> </code></pre><ul>
</ul></li> <li>Send and merge a pull request for the new ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/428">#428</a>)</li>
<li>I created a CSV with some ORCID identifiers that I had seen change so I could update any existing ones in the databse:</li>
<li><p>Send and merge a pull request for the new ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/428">#428</a>)</p></li> </ul>
<li><p>I created a CSV with some ORCID identifiers that I had seen change so I could update any existing ones in the databse:</p>
<pre><code>cg.creator.id,correct <pre><code>cg.creator.id,correct
&quot;Marius Ekué: 0000-0002-5829-6321&quot;,&quot;Marius R.M. Ekué: 0000-0002-5829-6321&quot; &quot;Marius Ekué: 0000-0002-5829-6321&quot;,&quot;Marius R.M. Ekué: 0000-0002-5829-6321&quot;
&quot;Mwungu: 0000-0001-6181-8445&quot;,&quot;Chris Miyinzi Mwungu: 0000-0001-6181-8445&quot; &quot;Mwungu: 0000-0001-6181-8445&quot;,&quot;Chris Miyinzi Mwungu: 0000-0001-6181-8445&quot;
&quot;Mwungu: 0000-0003-1658-287X&quot;,&quot;Chris Miyinzi Mwungu: 0000-0003-1658-287X&quot; &quot;Mwungu: 0000-0003-1658-287X&quot;,&quot;Chris Miyinzi Mwungu: 0000-0003-1658-287X&quot;
</code></pre></li> </code></pre><ul>
<li>But when I ran <code>fix-metadata-values.py</code> I didn't see any changes:</li>
<li><p>But when I ran <code>fix-metadata-values.py</code> I didn&rsquo;t see any changes:</p>
<pre><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
</code></pre></li>
</ul> </ul>
<pre><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
<h2 id="2019-07-06">2019-07-06</h2> </code></pre><h2 id="20190706">2019-07-06</h2>
<ul> <ul>
<li>Send a reminder to Marie about my notes on the <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li> <li>Send a reminder to Marie about my notes on the <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li>
</ul> </ul>
<h2 id="20190708">2019-07-08</h2>
<h2 id="2019-07-08">2019-07-08</h2>
<ul> <ul>
<li>Communicate with Atmire about the Solr statistics cores issue <li>Communicate with Atmire about the Solr statistics cores issue
<ul> <ul>
<li>I suspect we might need to get more disk space on DSpace Test so we can try to replicate the production environment more closely</li> <li>I suspect we might need to get more disk space on DSpace Test so we can try to replicate the production environment more closely</li>
</ul></li> </ul>
</li>
<li>Meeting with AgroKnow and CTA about their new ICT Update story telling thing <li>Meeting with AgroKnow and CTA about their new ICT Update story telling thing
<ul> <ul>
<li>AgroKnow has developed a React application to display tag clouds based on harvesting metadata and full text from CGSpace items</li> <li>AgroKnow has developed a React application to display tag clouds based on harvesting metadata and full text from CGSpace items</li>
<li>We discussed how to host it technically, perhaps we purchase a server to run it on and just give AgroKnow guys access</li> <li>We discussed how to host it technically, perhaps we purchase a server to run it on and just give AgroKnow guys access</li>
</ul></li> </ul>
</li>
<li><p>Playing with the idea of using <a href="https://github.com/BurntSushi/xsv">xsv</a> to do some basic batch quality checks on CSVs, for example to find items that might be duplicates if they have the same DOI or title:</p> <li>Playing with the idea of using <a href="https://github.com/BurntSushi/xsv">xsv</a> to do some basic batch quality checks on CSVs, for example to find items that might be duplicates if they have the same DOI or title:</li>
</ul>
<pre><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1' <pre><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
field,value,count field,value,count
cg.identifier.doi,https://doi.org/10.1016/j.agwat.2018.06.018,2 cg.identifier.doi,https://doi.org/10.1016/j.agwat.2018.06.018,2
$ xsv frequency --select dc.title --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1' $ xsv frequency --select dc.title --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
field,value,count field,value,count
dc.title,Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso,2 dc.title,Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso,2
</code></pre></li> </code></pre><ul>
<li>Or perhaps if DOIs are valid or not (having doi.org in the URL):</li>
<li><p>Or perhaps if DOIs are valid or not (having doi.org in the URL):</p> </ul>
<pre><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E 'doi.org' <pre><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E 'doi.org'
field,value,count field,value,count
cg.identifier.doi,https://hdl.handle.net/10520/EJC-1236ac700f,1 cg.identifier.doi,https://hdl.handle.net/10520/EJC-1236ac700f,1
</code></pre></li> </code></pre><ul>
<li>Or perhaps items with invalid ISSNs (according to the <a href="https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format">ISSN code format</a>):</li>
<li><p>Or perhaps items with invalid ISSNs (according to the <a href="https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format">ISSN code format</a>):</p> </ul>
<pre><code>$ xsv select dc.identifier.issn cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v '&quot;' | grep -v -E '^[0-9]{4}-[0-9]{3}[0-9xX]$' <pre><code>$ xsv select dc.identifier.issn cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v '&quot;' | grep -v -E '^[0-9]{4}-[0-9]{3}[0-9xX]$'
dc.identifier.issn dc.identifier.issn
978-3-319-71997-9 978-3-319-71997-9
@ -338,86 +296,69 @@ dc.identifier.issn
978-3-319-71997-9 978-3-319-71997-9
978-3-319-58789-9 978-3-319-58789-9
2320-7035 2320-7035
2593-9173 2593-9173
</code></pre></li> </code></pre><h2 id="20190709">2019-07-09</h2>
</ul>
<h2 id="2019-07-09">2019-07-09</h2>
<ul> <ul>
<li>Thinking about data cleaning automation again and found some resources about Python and Pandas: <li>Thinking about data cleaning automation again and found some resources about Python and Pandas:
<ul> <ul>
<li><a href="https://realpython.com/python-data-cleaning-numpy-pandas/">https://realpython.com/python-data-cleaning-numpy-pandas/</a></li> <li><a href="https://realpython.com/python-data-cleaning-numpy-pandas/">https://realpython.com/python-data-cleaning-numpy-pandas/</a></li>
<li><a href="https://mode.com/blog/python-data-cleaning-libraries">https://mode.com/blog/python-data-cleaning-libraries</a></li> <li><a href="https://mode.com/blog/python-data-cleaning-libraries">https://mode.com/blog/python-data-cleaning-libraries</a></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-07-11">2019-07-11</h2> </ul>
<h2 id="20190711">2019-07-11</h2>
<ul> <ul>
<li>Skype call with Marie Angelique about CG Core v2 <li>Skype call with Marie Angelique about CG Core v2
<ul> <ul>
<li>We discussed my comments and suggestions from last week</li> <li>We discussed my comments and suggestions from last week</li>
<li>One comment she had was that we should try to move our center-specific subjects into <code>DCTERMS.subject</code> and normalize them against AGROVOC</li> <li>One comment she had was that we should try to move our center-specific subjects into <code>DCTERMS.subject</code> and normalize them against AGROVOC</li>
<li>I updated my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">gist about CGSpace metadata changes</a></li> <li>I updated my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">gist about CGSpace metadata changes</a></li>
</ul></li> </ul>
</li>
<li>Skype call with Jane Poole to discuss OpenRXV/AReS Phase II TORs <li>Skype call with Jane Poole to discuss OpenRXV/AReS Phase II TORs
<ul> <ul>
<li>I need to follow up with Moayad about the reporting functionality</li> <li>I need to follow up with Moayad about the reporting functionality</li>
<li>Also, I need to email Harrison my notes on the CG Core v2 stuff</li> <li>Also, I need to email Harrison my notes on the CG Core v2 stuff</li>
<li>Also, Jane asked me to check the Data Portal to see which email address requests for confidential data are going</li> <li>Also, Jane asked me to check the Data Portal to see which email address requests for confidential data are going</li>
</ul></li>
<li>Yesterday Theirry from CTA asked me about an error he was getting while submitting an item on CGSpace: &ldquo;Unable to load Submission Information, since WorkspaceID (ID:S106658) is not a valid in-process submission.&rdquo;</li>
<li><p>I looked in the DSpace logs and found this right around the time of the screenshot he sent me:</p>
<pre><code>2019-07-10 11:50:27,433 INFO org.dspace.submit.step.CompleteStep @ lewyllie@cta.int:session_id=A920730003BCAECE8A3B31DCDE11A97E:submission_complete:Completed submission with id=106658
</code></pre></li>
<li><p>I&rsquo;m assuming something happened in his browser (like a refresh) after the item was submitted&hellip;</p></li>
</ul> </ul>
</li>
<h2 id="2019-07-12">2019-07-12</h2> <li>Yesterday Theirry from CTA asked me about an error he was getting while submitting an item on CGSpace: &ldquo;Unable to load Submission Information, since WorkspaceID (ID:S106658) is not a valid in-process submission.&rdquo;</li>
<li>I looked in the DSpace logs and found this right around the time of the screenshot he sent me:</li>
</ul>
<pre><code>2019-07-10 11:50:27,433 INFO org.dspace.submit.step.CompleteStep @ lewyllie@cta.int:session_id=A920730003BCAECE8A3B31DCDE11A97E:submission_complete:Completed submission with id=106658
</code></pre><ul>
<li>I'm assuming something happened in his browser (like a refresh) after the item was submitted&hellip;</li>
</ul>
<h2 id="20190712">2019-07-12</h2>
<ul> <ul>
<li>Atmire responded with some initial feedback about our Tomcat configuration related to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue I raised recently</a> <li>Atmire responded with some initial feedback about our Tomcat configuration related to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue I raised recently</a>
<ul> <ul>
<li>Unfortunately there is no concrete feedback yet</li> <li>Unfortunately there is no concrete feedback yet</li>
<li>I think we need to upgrade our DSpace Test server so we can fit all the Solr cores&hellip;</li> <li>I think we need to upgrade our DSpace Test server so we can fit all the Solr cores&hellip;</li>
<li>Actually, I looked and there were over 40 GB free on DSpace Test so I copied the Solr statistics cores for the years 2017 to 2010 from CGSpace to DSpace Test because they weren&rsquo;t actually very large</li> <li>Actually, I looked and there were over 40 GB free on DSpace Test so I copied the Solr statistics cores for the years 2017 to 2010 from CGSpace to DSpace Test because they weren't actually very large</li>
<li>I re-deployed DSpace for good measure, and I think all Solr cores are loading&hellip; I will do more tests later</li> <li>I re-deployed DSpace for good measure, and I think all Solr cores are loading&hellip; I will do more tests later</li>
</ul></li> </ul>
</li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li> <li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Try to run <code>dspace cleanup -v</code> on CGSpace and ran into an error:</li>
<li><p>Try to run <code>dspace cleanup -v</code> on CGSpace and ran into an error:</p> </ul>
<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; <pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(167394) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(167394) is still referenced from table &quot;bundle&quot;.
</code></pre></li> </code></pre><ul>
<li>The solution is, as always:</li>
<li><p>The solution is, as always:</p> </ul>
<pre><code># su - postgres <pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);' $ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><h2 id="20190716">2019-07-16</h2>
</ul>
<h2 id="2019-07-16">2019-07-16</h2>
<ul> <ul>
<li><p>Completely reset the Podman configuration on my laptop because there were some layers that I couldn&rsquo;t delete and it had been some time since I did a cleanup:</p> <li>Completely reset the Podman configuration on my laptop because there were some layers that I couldn't delete and it had been some time since I did a cleanup:</li>
</ul>
<pre><code>$ podman system prune -a -f --volumes <pre><code>$ podman system prune -a -f --volumes
$ sudo rm -rf ~/.local/share/containers $ sudo rm -rf ~/.local/share/containers
</code></pre></li> </code></pre><ul>
<li>Then pull a new PostgreSQL 9.6 image and load a CGSpace database dump into a new local test container:</li>
<li><p>Then pull a new PostgreSQL 9.6 image and load a CGSpace database dump into a new local test container:</p> </ul>
<pre><code>$ podman pull postgres:9.6-alpine <pre><code>$ podman pull postgres:9.6-alpine
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine $ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
$ createuser -h localhost -U postgres --pwprompt dspacetest $ createuser -h localhost -U postgres --pwprompt dspacetest
@ -426,108 +367,91 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-07-16.backup $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-07-16.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
</code></pre></li> </code></pre><ul>
<li>Start working on implementing the <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</li>
<li><p>Start working on implementing the <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</p></li> <li>Make a pull request to CG Core v2 with some fixes for typos in the specification (<a href="https://github.com/AgriculturalSemantics/cg-core/pull/5">#5</a>)</li>
<li><p>Make a pull request to CG Core v2 with some fixes for typos in the specification (<a href="https://github.com/AgriculturalSemantics/cg-core/pull/5">#5</a>)</p></li>
</ul> </ul>
<h2 id="20190718">2019-07-18</h2>
<h2 id="2019-07-18">2019-07-18</h2>
<ul> <ul>
<li>Talk to Moayad about the remaining issues for OpenRXV / AReS <li>Talk to Moayad about the remaining issues for OpenRXV / AReS
<ul> <ul>
<li>He sent a pull request with some changes for the bar chart and documentation about configuration, and said he&rsquo;d finish the export feature next week</li> <li>He sent a pull request with some changes for the bar chart and documentation about configuration, and said he'd finish the export feature next week</li>
</ul></li> </ul>
</li>
<li><p>Sisay said a user was having problems registering on CGSpace and it looks like the email account expired again:</p> <li>Sisay said a user was having problems registering on CGSpace and it looks like the email account expired again:</li>
</ul>
<pre><code>$ dspace test-email <pre><code>$ dspace test-email
About to send test email: About to send test email:
- To: blahh@cgiar.org - To: blahh@cgiar.org
- Subject: DSpace test email - Subject: DSpace test email
- Server: smtp.office365.com - Server: smtp.office365.com
Error sending email: Error sending email:
- Error: javax.mail.AuthenticationFailedException - Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance. Please see the DSpace documentation for assistance.
</code></pre></li> </code></pre><ul>
<li>I emailed ICT to ask them to reset it and make the expiration period longer if possible</li>
<li><p>I emailed ICT to ask them to reset it and make the expiration period longer if possible</p></li>
</ul> </ul>
<h2 id="20190719">2019-07-19</h2>
<h2 id="2019-07-19">2019-07-19</h2>
<ul> <ul>
<li>ICT reset the password for the CGSpace support account and apparently removed the expiry requirement <li>ICT reset the password for the CGSpace support account and apparently removed the expiry requirement
<ul> <ul>
<li>I tested the account and it&rsquo;s working</li> <li>I tested the account and it's working</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-07-20">2019-07-20</h2> </ul>
<h2 id="20190720">2019-07-20</h2>
<ul> <ul>
<li><p>Create an account for Lionelle Samnick on CGSpace because the registration isn&rsquo;t working for some reason:</p> <li>Create an account for Lionelle Samnick on CGSpace because the registration isn't working for some reason:</li>
</ul>
<pre><code>$ dspace user --add --givenname Lionelle --surname Samnick --email blah@blah.com --password 'blah' <pre><code>$ dspace user --add --givenname Lionelle --surname Samnick --email blah@blah.com --password 'blah'
</code></pre></li> </code></pre><ul>
<li>I added her as a submitter to <a href="https://cgspace.cgiar.org/handle/10568/74536">CTA ISF Pro-Agro series</a></li>
<li><p>I added her as a submitter to <a href="https://cgspace.cgiar.org/handle/10568/74536">CTA ISF Pro-Agro series</a></p></li> <li>Start looking at 1429 records for the Bioversity batch import
<li><p>Start looking at 1429 records for the Bioversity batch import</p>
<ul> <ul>
<li>Multiple authors should be specified with multi-value separatator (||) instead of ;</li> <li>Multiple authors should be specified with multi-value separatator (||) instead of ;</li>
<li>We don&rsquo;t use &ldquo;(eds)&rdquo; as an author</li> <li>We don't use &ldquo;(eds)&rdquo; as an author</li>
<li>Same issue with dc.publisher using &ldquo;;&rdquo; for multiple values</li> <li>Same issue with dc.publisher using &ldquo;;&rdquo; for multiple values</li>
<li>Some invalid ISSNs in dc.identifier.issn (they look like ISBNs)</li> <li>Some invalid ISSNs in dc.identifier.issn (they look like ISBNs)</li>
<li>I see some ISSNs in the dc.identifier.isbn field</li> <li>I see some ISSNs in the dc.identifier.isbn field</li>
<li>I see some invalid ISBNs that look like Excel errors (9,78E+12)</li> <li>I see some invalid ISBNs that look like Excel errors (9,78E+12)</li>
<li>For DOI we just use the URL, not &ldquo;DOI: <a href="https://doi.org...&quot;">https://doi.org...&quot;</a></li> <li>For DOI we just use the URL, not &ldquo;DOI: <a href="https://doi.org..">https://doi.org..</a>.&rdquo;</li>
<li>I see an invalid &ldquo;LEAVE BLANK&rdquo; in the cg.contributor.crp field</li> <li>I see an invalid &ldquo;LEAVE BLANK&rdquo; in the cg.contributor.crp field</li>
<li>Country field is using &ldquo;,&rdquo; for multiple values instead of &ldquo;||&rdquo;</li> <li>Country field is using &ldquo;,&rdquo; for multiple values instead of &ldquo;||&rdquo;</li>
<li>Region field is using &ldquo;,&rdquo; for multiple values instead of &ldquo;||&rdquo;</li> <li>Region field is using &ldquo;,&rdquo; for multiple values instead of &ldquo;||&rdquo;</li>
<li>Language field should be lowercase like &ldquo;en&rdquo;, and it is using the wrong multiple value separator, and has some invalid values</li> <li>Language field should be lowercase like &ldquo;en&rdquo;, and it is using the wrong multiple value separator, and has some invalid values</li>
<li>What is the cg.identifier.url2 field? You should probably add those as cg.link.reference</li> <li>What is the cg.identifier.url2 field? You should probably add those as cg.link.reference</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-07-22">2019-07-22</h2>
<ul>
<li><p>Raise an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/8">issue on CG Core v2 spec regarding country and region coverage</a></p>
<ul>
<li><p>The current standard has them implemented as a class like this:</p>
<pre><code>&lt;dct:coverage&gt;
&lt;dct:spatial&gt;
&lt;type&gt;Country&lt;/type&gt;
&lt;dct:identifier&gt;http://sws.geonames.org/192950&lt;/dct:identifier&gt;
&lt;rdfs:label&gt;Kenya&lt;/rdfs:label&gt;
&lt;/dct:spatial&gt;
&lt;/dct:coverage&gt;
</code></pre></li>
</ul></li>
<li><p>I left a note saying that DSpace is technically limited to a flat schema so we use <code>cg.coverage.country: Kenya</code></p></li>
<li><p>Do a little more work on CG Core v2 in the input forms</p></li>
</ul> </ul>
<h2 id="20190722">2019-07-22</h2>
<h2 id="2019-07-25">2019-07-25</h2>
<ul> <ul>
<li>Generate a list of the ORCID identifiers that we added to CGSpace in 2019 for Sara Jani at ICARDA</li> <li>Raise an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/8">issue on CG Core v2 spec regarding country and region coverage</a>
<ul>
<li><p>Bioversity sent a new file for their migration to CGSpace</p> <li>The current standard has them implemented as a class like this:</li>
</ul>
</li>
</ul>
<pre><code> &lt;dct:coverage&gt;
&lt;dct:spatial&gt;
&lt;type&gt;Country&lt;/type&gt;
&lt;dct:identifier&gt;http://sws.geonames.org/192950&lt;/dct:identifier&gt;
&lt;rdfs:label&gt;Kenya&lt;/rdfs:label&gt;
&lt;/dct:spatial&gt;
&lt;/dct:coverage&gt;
</code></pre><ul>
<li>I left a note saying that DSpace is technically limited to a flat schema so we use <code>cg.coverage.country: Kenya</code></li>
<li>Do a little more work on CG Core v2 in the input forms</li>
</ul>
<h2 id="20190725">2019-07-25</h2>
<ul>
<li>
<p>Generate a list of the ORCID identifiers that we added to CGSpace in 2019 for Sara Jani at ICARDA</p>
</li>
<li>
<p>Bioversity sent a new file for their migration to CGSpace</p>
<ul> <ul>
<li>There is always a blank row and blank column at the end</li> <li>There is always a blank row and blank column at the end</li>
<li>One invalid type (Brie)</li> <li>One invalid type (Brie)</li>
@ -537,56 +461,54 @@ Please see the DSpace documentation for assistance.
<li>A few strange publishers after splitting multi-value cells, like &ldquo;(Belgium)&rdquo;</li> <li>A few strange publishers after splitting multi-value cells, like &ldquo;(Belgium)&rdquo;</li>
<li>Deleted four ISSNs that are actually ISBNs and are already present in the ISBN field</li> <li>Deleted four ISSNs that are actually ISBNs and are already present in the ISBN field</li>
<li>Eight invalid ISBNs</li> <li>Eight invalid ISBNs</li>
<li>Convert all DOIs to &ldquo;<a href="https://doi.org&quot;">https://doi.org&quot;</a> format and fix one invalid DOI</li> <li>Convert all DOIs to &ldquo;<a href="https://doi.org">https://doi.org</a>&rdquo; format and fix one invalid DOI</li>
<li>Fix a handful of incorrect CRPs that seem to have been split on comma &ldquo;,&rdquo;</li> <li>Fix a handful of incorrect CRPs that seem to have been split on comma &ldquo;,&rdquo;</li>
<li>Lots of strange values in cg.link.reference, and I normalized all DOIs to <a href="https://doi.org">https://doi.org</a> format</li> <li>Lots of strange values in cg.link.reference, and I normalized all DOIs to <a href="https://doi.org">https://doi.org</a> format
<ul>
<li>There are lots of invalid links here, like &ldquo;36&rdquo; and &ldquo;recordlink:publications:2606&rdquo; and &ldquo;t3://record?identifier=publications&amp;uid=2606&rdquo;</li> <li>There are lots of invalid links here, like &ldquo;36&rdquo; and &ldquo;recordlink:publications:2606&rdquo; and &ldquo;t3://record?identifier=publications&amp;uid=2606&rdquo;</li>
<li>Also there are hundreds of items that use the same value for cg.link.reference AND cg.link.dataurl</li> <li>Also there are hundreds of items that use the same value for cg.link.reference AND cg.link.dataurl</li>
</ul>
</li>
<li>Use https:// for all Bioversity links (reference, data url, permalink)</li> <li>Use https:// for all Bioversity links (reference, data url, permalink)</li>
</ul></li> </ul>
</li>
<li><p>I might be able to use <a href="https://pypi.org/project/isbnlib/">isbnlib</a> to validate ISBNs in Python:</p> <li>
<p>I might be able to use <a href="https://pypi.org/project/isbnlib/">isbnlib</a> to validate ISBNs in Python:</p>
</li>
</ul>
<pre><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'): <pre><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'):
print(&quot;Yes&quot;) print(&quot;Yes&quot;)
else: else:
print(&quot;No&quot;) print(&quot;No&quot;)
</code></pre></li> </code></pre><ul>
<li>Or with <a href="https://github.com/arthurdejong/python-stdnum">python-stdnum</a>:</li>
<li><p>Or with <a href="https://github.com/arthurdejong/python-stdnum">python-stdnum</a>:</p> </ul>
<pre><code>from stdnum import isbn <pre><code>from stdnum import isbn
from stdnum import issn from stdnum import issn
isbn.validate('978-92-9043-389-7') isbn.validate('978-92-9043-389-7')
issn.validate('1020-3362') issn.validate('1020-3362')
</code></pre></li> </code></pre><h2 id="20190726">2019-07-26</h2>
</ul>
<h2 id="2019-07-26">2019-07-26</h2>
<ul> <ul>
<li><p>Bioversity sent me an updated CSV file that fixes some of the issues I pointed out yesterday</p> <li>
<p>Bioversity sent me an updated CSV file that fixes some of the issues I pointed out yesterday</p>
<ul> <ul>
<li>There are still 1429 records</li> <li>There are still 1429 records</li>
<li>There are still one extra row and one extra column</li> <li>There are still one extra row and one extra column</li>
<li>There are still eight invalid ISBNs (according to my <code>validate.py</code> script)</li> <li>There are still eight invalid ISBNs (according to my <code>validate.py</code> script)</li>
</ul></li>
<li><p>I figured out a GREL to trim spaces in multi-value cells without splitting them:</p>
<pre><code>value.replace(/\s+\|\|/,&quot;||&quot;).replace(/\|\|\s+/,&quot;||&quot;)
</code></pre></li>
<li><p>I whipped up a quick script using Python Pandas to do whitespace cleanup</p></li>
</ul> </ul>
</li>
<h2 id="2019-07-29">2019-07-29</h2> <li>
<p>I figured out a GREL to trim spaces in multi-value cells without splitting them:</p>
</li>
</ul>
<pre><code>value.replace(/\s+\|\|/,&quot;||&quot;).replace(/\|\|\s+/,&quot;||&quot;)
</code></pre><ul>
<li>I whipped up a quick script using Python Pandas to do whitespace cleanup</li>
</ul>
<h2 id="20190729">2019-07-29</h2>
<ul> <ul>
<li>I turned the Pandas script into a proper Python package called <a href="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a> <li>I turned the Pandas script into a proper Python package called <a href="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a>
<ul> <ul>
<li>It supports CSV and Excel files</li> <li>It supports CSV and Excel files</li>
<li>It fixes whitespace errors and erroneous multi-value separators (&ldquo;|&rdquo;) and validates ISSN, ISBNs, and dates</li> <li>It fixes whitespace errors and erroneous multi-value separators (&ldquo;|&rdquo;) and validates ISSN, ISBNs, and dates</li>
@ -594,18 +516,16 @@ issn.validate('1020-3362')
<li>I added fixes to drop duplicate metadata values</li> <li>I added fixes to drop duplicate metadata values</li>
<li>And lastly, I added validation of ISO 639-2 and ISO 639-3 languages</li> <li>And lastly, I added validation of ISO 639-2 and ISO 639-3 languages</li>
<li>And lastly lastly, I added AGROVOC validation of subject terms</li> <li>And lastly lastly, I added AGROVOC validation of subject terms</li>
</ul></li> </ul>
</li>
<li>Inform Bioversity that there is an error in their CSV, seemingly caused by quotes in the citation field</li> <li>Inform Bioversity that there is an error in their CSV, seemingly caused by quotes in the citation field</li>
</ul> </ul>
<h2 id="20190730">2019-07-30</h2>
<h2 id="2019-07-30">2019-07-30</h2>
<ul> <ul>
<li>Add support for removing newlines (line feeds) to <a href="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a></li> <li>Add support for removing newlines (line feeds) to <a href="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a></li>
<li>On the subject of validating some of our fields like countries and regions, Abenet pointed out that these should all be valid AGROVOC terms, so we can actually try to validate against that!</li> <li>On the subject of validating some of our fields like countries and regions, Abenet pointed out that these should all be valid AGROVOC terms, so we can actually try to validate against that!</li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,19 +8,16 @@
<meta property="og:title" content="August, 2019" /> <meta property="og:title" content="August, 2019" />
<meta property="og:description" content="2019-08-03 <meta property="og:description" content="2019-08-03
Look at Bioversity&#39;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;
Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;
2019-08-04 2019-08-04
Deploy ORCID identifier updates requested by Bioversity to CGSpace Deploy ORCID identifier updates requested by Bioversity to CGSpace
Run system updates on CGSpace (linode18) and reboot it Run system updates on CGSpace (linode18) and reboot it
Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip; Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;
After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky. After rebooting, all statistics cores were loaded&hellip; wow, that&#39;s lucky.
Run system updates on DSpace Test (linode19) and reboot it Run system updates on DSpace Test (linode19) and reboot it
" /> " />
@ -33,23 +30,20 @@ Run system updates on DSpace Test (linode19) and reboot it
<meta name="twitter:title" content="August, 2019"/> <meta name="twitter:title" content="August, 2019"/>
<meta name="twitter:description" content="2019-08-03 <meta name="twitter:description" content="2019-08-03
Look at Bioversity&#39;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;
Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;
2019-08-04 2019-08-04
Deploy ORCID identifier updates requested by Bioversity to CGSpace Deploy ORCID identifier updates requested by Bioversity to CGSpace
Run system updates on CGSpace (linode18) and reboot it Run system updates on CGSpace (linode18) and reboot it
Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip; Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;
After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky. After rebooting, all statistics cores were loaded&hellip; wow, that&#39;s lucky.
Run system updates on DSpace Test (linode19) and reboot it Run system updates on DSpace Test (linode19) and reboot it
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -130,473 +124,416 @@ Run system updates on DSpace Test (linode19) and reboot it
</p> </p>
</header> </header>
<h2 id="2019-08-03">2019-08-03</h2> <h2 id="20190803">2019-08-03</h2>
<ul> <ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li> <li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul> </ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul> <ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li> <li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it <li>Run system updates on CGSpace (linode18) and reboot it
<ul> <ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li> <li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li> <li>After rebooting, all statistics cores were loaded&hellip; wow, that's lucky.</li>
</ul></li> </ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<h2 id="20190805">2019-08-05</h2>
<h2 id="2019-08-05">2019-08-05</h2>
<ul> <ul>
<li>Update Tomcat to 7.0.96 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li> <li>Update Tomcat to 7.0.96 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
<li>Update PostgreSQL JDBC driver to 42.2.6 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastrucutre playbooks</a></li> <li>Update PostgreSQL JDBC driver to 42.2.6 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastrucutre playbooks</a></li>
<li>Deploy both on DSpace Test (linode19)</li> <li>Deploy both on DSpace Test (linode19)</li>
<li>Looking at the 1429 records for Bioversity migration again
<li><p>Looking at the 1429 records for Bioversity migration again</p> <ul>
<li>The following items use the same exact PDF and seem to be duplicates:
<ul>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=10191</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=342</li>
</ul>
</li>
<li>The following items use the same exact PDF, but one seems to be incorrect:
<ul>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=5347</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=5340</li>
</ul>
</li>
<li>The following PDFs are used by several items incorrectly:
<ul> <ul>
<li>The following items use the same exact PDF and seem to be duplicates:</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=10191">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=10191</a></li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=342">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=342</a></li>
<li>The following items use the same exact PDF, but one seems to be incorrect:</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=5347">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=5347</a></li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=5340">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=5340</a></li>
<li>The following PDFs are used by several items incorrectly:</li>
<li><code>Report_of_a_Working_Group_on_Allium_7.pdf</code></li> <li><code>Report_of_a_Working_Group_on_Allium_7.pdf</code></li>
<li><code>Report_of_a_Working_Group_on_Allium_Fourth_meeting_1696.pdf</code></li> <li><code>Report_of_a_Working_Group_on_Allium_Fourth_meeting_1696.pdf</code></li>
<li>I checked the SHA1 hashes of each PDF and found that some appear more than once&hellip;</li>
<li>The following items use the same PDF with a different name, but seem to be duplicates (pick one?):</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=433">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=433</a></li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=10189">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=10189</a></li>
<li>The following items use the same PDF with a different name, but seem to be duplicates (pick one?):</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=332">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=332</a></li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=10187">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1[news]=10187</a></li>
<li>There are about thirty PDFs that have French or Spanish filenames and there seems to be an encoding issue</li>
<li>I asked Francesco if he can give me a PDF URL column instead of a &ldquo;filename&rdquo; column so I can download the files myself</li>
<li><p>At <em>least</em> the ~50 filenames identified by the following GREL will have issues:</p>
<pre><code>or(
isNotNull(value.match(/^.*.*$/)),
isNotNull(value.match(/^.*é.*$/)),
isNotNull(value.match(/^.*á.*$/)),
isNotNull(value.match(/^.*è.*$/)),
isNotNull(value.match(/^.*í.*$/)),
isNotNull(value.match(/^.*ó.*$/)),
isNotNull(value.match(/^.*ú.*$/)),
isNotNull(value.match(/^.*à.*$/)),
isNotNull(value.match(/^.*û.*$/))
).toString()
</code></pre></li>
</ul></li>
<li><p>I tried to extract the filenames and construct a URL to download the PDFs with my <code>generate-thumbnails.py</code> script, but there seem to be several paths for PDFs so I can&rsquo;t guess it properly</p></li>
<li><p>I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test</p></li>
</ul> </ul>
</li>
<h2 id="2019-08-06">2019-08-06</h2> <li>I checked the SHA1 hashes of each PDF and found that some appear more than once&hellip;</li>
<li>The following items use the same PDF with a different name, but seem to be duplicates (pick one?):
<ul>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=433</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=10189</li>
</ul>
</li>
<li>The following items use the same PDF with a different name, but seem to be duplicates (pick one?):
<ul>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=332</li>
<li><a href="https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1">https://www.bioversityinternational.org/index.php?id=244&amp;tx_news_pi1</a>[news]=10187</li>
</ul>
</li>
<li>There are about thirty PDFs that have French or Spanish filenames and there seems to be an encoding issue
<ul>
<li>I asked Francesco if he can give me a PDF URL column instead of a &ldquo;filename&rdquo; column so I can download the files myself</li>
<li>At <em>least</em> the ~50 filenames identified by the following GREL will have issues:</li>
</ul>
</li>
</ul>
</li>
</ul>
<pre><code>or(
isNotNull(value.match(/^.*.*$/)),
isNotNull(value.match(/^.*é.*$/)),
isNotNull(value.match(/^.*á.*$/)),
isNotNull(value.match(/^.*è.*$/)),
isNotNull(value.match(/^.*í.*$/)),
isNotNull(value.match(/^.*ó.*$/)),
isNotNull(value.match(/^.*ú.*$/)),
isNotNull(value.match(/^.*à.*$/)),
isNotNull(value.match(/^.*û.*$/))
).toString()
</code></pre><ul>
<li>I tried to extract the filenames and construct a URL to download the PDFs with my <code>generate-thumbnails.py</code> script, but there seem to be several paths for PDFs so I can't guess it properly</li>
<li>I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test</li>
</ul>
<h2 id="20190806">2019-08-06</h2>
<ul> <ul>
<li>Francesca responded to address my feedback yesterday <li>Francesca responded to address my feedback yesterday
<ul> <ul>
<li>I made some changes to the CSV based on her feedback (remove two duplicates, change one PDF file name, change two titles)</li> <li>I made some changes to the CSV based on her feedback (remove two duplicates, change one PDF file name, change two titles)</li>
<li>Then I found some items that have PDFs in multiple languages that only list one language in <code>dc.language.iso</code> so I changed them</li> <li>Then I found some items that have PDFs in multiple languages that only list one language in <code>dc.language.iso</code> so I changed them</li>
<li>Strangley, one item was referring to a 7zip file&hellip;</li> <li>Strangley, one item was referring to a 7zip file&hellip;</li>
<li>After removing the two duplicates there are now 1427 records</li> <li>After removing the two duplicates there are now 1427 records</li>
<li>Fix one invalid ISSN: 1020-2002→1020-3362</li> <li>Fix one invalid ISSN: 1020-2002→1020-3362</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-08-07">2019-08-07</h2> </ul>
<h2 id="20190807">2019-08-07</h2>
<ul> <ul>
<li>Daniel Haile-Michael asked about using a logical OR with the DSpace OpenSearch, but I looked in the DSpace manual and it does not seem to be possible</li> <li>Daniel Haile-Michael asked about using a logical OR with the DSpace OpenSearch, but I looked in the DSpace manual and it does not seem to be possible</li>
</ul> </ul>
<h2 id="20190808">2019-08-08</h2>
<h2 id="2019-08-08">2019-08-08</h2>
<ul> <ul>
<li><p>Moayad noticed that the HTTPS certificate expired on the AReS dev server (linode20)</p> <li>Moayad noticed that the HTTPS certificate expired on the AReS dev server (linode20)
<ul> <ul>
<li>The first problem was that there is a Docker container listening on port 80, so it conflicts with the ACME http-01 validation</li> <li>The first problem was that there is a Docker container listening on port 80, so it conflicts with the ACME http-01 validation</li>
<li>The second problem was that we only allow access to port 80 from localhost</li> <li>The second problem was that we only allow access to port 80 from localhost</li>
<li>I adjusted the <code>renew-letsencrypt</code> systemd service so it stops/starts the Docker container and firewall:</li>
<li><p>I adjusted the <code>renew-letsencrypt</code> systemd service so it stops/starts the Docker container and firewall:</p> </ul>
</li>
</ul>
<pre><code># /opt/certbot-auto renew --standalone --pre-hook &quot;/usr/bin/docker stop angular_nginx; /bin/systemctl stop firewalld&quot; --post-hook &quot;/bin/systemctl start firewalld; /usr/bin/docker start angular_nginx&quot; <pre><code># /opt/certbot-auto renew --standalone --pre-hook &quot;/usr/bin/docker stop angular_nginx; /bin/systemctl stop firewalld&quot; --post-hook &quot;/bin/systemctl start firewalld; /usr/bin/docker start angular_nginx&quot;
</code></pre></li> </code></pre><ul>
</ul></li> <li>It is important that the firewall starts back up before the Docker container or else Docker will complain about missing iptables chains</li>
<li>Also, I updated to the latest TLS Intermediate settings as appropriate for Ubuntu 18.04's <a href="https://ssl-config.mozilla.org/#server=nginx&amp;server-version=1.16.0&amp;config=intermediate&amp;openssl-version=1.1.0g&amp;hsts=false&amp;ocsp=false">OpenSSL 1.1.0g with nginx 1.16.0</a></li>
<li><p>It is important that the firewall starts back up before the Docker container or else Docker will complain about missing iptables chains</p></li> <li>Run all system updates on AReS dev server (linode20) and reboot it</li>
<li>Get a list of all PDFs from the Bioversity migration that fail to download and save them so I can try again with a different path in the URL:</li>
<li><p>Also, I updated to the latest TLS Intermediate settings as appropriate for Ubuntu 18.04&rsquo;s <a href="https://ssl-config.mozilla.org/#server=nginx&amp;server-version=1.16.0&amp;config=intermediate&amp;openssl-version=1.1.0g&amp;hsts=false&amp;ocsp=false">OpenSSL 1.1.0g with nginx 1.16.0</a></p></li> </ul>
<li><p>Run all system updates on AReS dev server (linode20) and reboot it</p></li>
<li><p>Get a list of all PDFs from the Bioversity migration that fail to download and save them so I can try again with a different path in the URL:</p>
<pre><code>$ ./generate-thumbnails.py -i /tmp/2019-08-05-Bioversity-Migration.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs.txt <pre><code>$ ./generate-thumbnails.py -i /tmp/2019-08-05-Bioversity-Migration.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs.txt
$ grep -B1 &quot;Download failed&quot; /tmp/2019-08-08-download-pdfs.txt | grep &quot;Downloading&quot; | sed -e 's/&gt; Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 &gt; /tmp/user-upload.csv $ grep -B1 &quot;Download failed&quot; /tmp/2019-08-08-download-pdfs.txt | grep &quot;Downloading&quot; | sed -e 's/&gt; Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 &gt; /tmp/user-upload.csv
$ ./generate-thumbnails.py -i /tmp/user-upload.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs2.txt $ ./generate-thumbnails.py -i /tmp/user-upload.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs2.txt
$ grep -B1 &quot;Download failed&quot; /tmp/2019-08-08-download-pdfs2.txt | grep &quot;Downloading&quot; | sed -e 's/&gt; Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 &gt; /tmp/user-upload2.csv $ grep -B1 &quot;Download failed&quot; /tmp/2019-08-08-download-pdfs2.txt | grep &quot;Downloading&quot; | sed -e 's/&gt; Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 &gt; /tmp/user-upload2.csv
$ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs3.txt $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs3.txt
</code></pre></li> </code></pre><ul>
<li>
<li><p>(the weird sed regex removes color codes, because my generate-thumbnails script prints pretty colors)</p></li> <p>(the weird sed regex removes color codes, because my generate-thumbnails script prints pretty colors)</p>
</li>
<li><p>Some PDFs are uploaded in different paths so I have to try a few times to get them all:</p> <li>
<p>Some PDFs are uploaded in different paths so I have to try a few times to get them all:</p>
<ul> <ul>
<li><code>/fileadmin/_migrated/uploads/tx_news/</code></li> <li><code>/fileadmin/_migrated/uploads/tx_news/</code></li>
<li><code>/fileadmin/user_upload/online_library/publications/pdfs/</code></li> <li><code>/fileadmin/user_upload/online_library/publications/pdfs/</code></li>
<li><code>/fileadmin/user_upload/</code></li> <li><code>/fileadmin/user_upload/</code></li>
</ul></li> </ul>
</li>
<li><p>Even so, there are still 52 items with incorrect filenames, so I can&rsquo;t derive their PDF URLs&hellip;</p> <li>
<p>Even so, there are still 52 items with incorrect filenames, so I can't derive their PDF URLs&hellip;</p>
<ul> <ul>
<li>For example, <code>Wild_cherry_Prunus_avium_859.pdf</code> is here (with double underscore): <a href="https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf">https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf</a></li> <li>For example, <code>Wild_cherry_Prunus_avium_859.pdf</code> is here (with double underscore): <a href="https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf">https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf</a></li>
</ul></li> </ul>
</li>
<li><p>I will proceed with a metadata-only upload first and then let them know about the missing PDFs</p></li> <li>
<p>I will proceed with a metadata-only upload first and then let them know about the missing PDFs</p>
<li><p>Troubleshoot an issue we had with proxying to the new development version of AReS from DSpace Test (linode19)</p> </li>
<li>
<p>Troubleshoot an issue we had with proxying to the new development version of AReS from DSpace Test (linode19)</p>
<ul> <ul>
<li>For some reason the host header in the proxy pass is not set so nginx on DSpace Test makes a request to the upstream nginx on an IP-based virtual host</li> <li>For some reason the host header in the proxy pass is not set so nginx on DSpace Test makes a request to the upstream nginx on an IP-based virtual host</li>
<li>The upstream nginx returns HTTP 444 because we configured it to not answer when a request does not send a valid hostname</li> <li>The upstream nginx returns HTTP 444 because we configured it to not answer when a request does not send a valid hostname</li>
<li>The solution is to set the host header when proxy passing:</li>
<li><p>The solution is to set the host header when proxy passing:</p>
<pre><code>proxy_set_header Host dev.ares.codeobia.com;
</code></pre></li>
</ul></li>
<li><p>Though I am really wondering why this happened now, because the configuration has been working for months&hellip;</p></li>
<li><p>Improve the output of the suspicious characters check in <a href="https://github.com/alanorth/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.0</p></li>
</ul> </ul>
</li>
<h2 id="2019-08-09">2019-08-09</h2> </ul>
<pre><code>proxy_set_header Host dev.ares.codeobia.com;
</code></pre><ul>
<li>Though I am really wondering why this happened now, because the configuration has been working for months&hellip;</li>
<li>Improve the output of the suspicious characters check in <a href="https://github.com/alanorth/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.0</li>
</ul>
<h2 id="20190809">2019-08-09</h2>
<ul> <ul>
<li>Looking at the 128 IITA records (20195TH.xls) that Sisay uploadd to DSpace Test last month: <a href="https://dspacetest.cgiar.org/handle/10568/102361">IITA_July_29</a> <li>Looking at the 128 IITA records (20195TH.xls) that Sisay uploadd to DSpace Test last month: <a href="https://dspacetest.cgiar.org/handle/10568/102361">IITA_July_29</a>
<ul> <ul>
<li>The records are pretty clean because Sisay ran them through the csv-metadata-quality tool</li> <li>The records are pretty clean because Sisay ran them through the csv-metadata-quality tool</li>
<li>I fixed one incorrect country (MELBOURNE)</li> <li>I fixed one incorrect country (MELBOURNE)</li>
<li>I normalized all DOIs to be <a href="https://doi.org">https://doi.org</a> format</li> <li>I normalized all DOIs to be <a href="https://doi.org">https://doi.org</a> format</li>
<li>This item is using the wrong Google Books link: <a href="https://dspacetest.cgiar.org/handle/10568/102593">https://dspacetest.cgiar.org/handle/10568/102593</a></li> <li>This item is using the wrong Google Books link: <a href="https://dspacetest.cgiar.org/handle/10568/102593">https://dspacetest.cgiar.org/handle/10568/102593</a></li>
<li>The French abstract here has copy/paste errors: <a href="https://dspacetest.cgiar.org/handle/10568/102491">https://dspacetest.cgiar.org/handle/10568/102491</a></li> <li>The French abstract here has copy/paste errors: <a href="https://dspacetest.cgiar.org/handle/10568/102491">https://dspacetest.cgiar.org/handle/10568/102491</a></li>
<li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:</li> <li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:
<ul>
<li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li> <li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li>
<li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li> <li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li>
</ul>
</li>
<li>I asked Bosede to check about twenty-five invalid AGROVOC subjects identified by csv-metadata-quality script</li> <li>I asked Bosede to check about twenty-five invalid AGROVOC subjects identified by csv-metadata-quality script</li>
<li>I still need to check the sponsors and then check for duplicates</li> <li>I still need to check the sponsors and then check for duplicates</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-08-10">2019-08-10</h2> </ul>
<h2 id="20190810">2019-08-10</h2>
<ul> <ul>
<li>Add checks for uncommon filename extensions and replacements for unneccesary Unicode to the csv-metadata-quality script</li> <li>Add checks for uncommon filename extensions and replacements for unneccesary Unicode to the csv-metadata-quality script</li>
</ul> </ul>
<h2 id="20190812">2019-08-12</h2>
<h2 id="2019-08-12">2019-08-12</h2>
<ul> <ul>
<li>Looking at the 128 IITA records again: <li>Looking at the 128 IITA records again:
<ul> <ul>
<li>Validate and normalize affiliations against our 2019-02 list using reconcile-csv and OpenRefine:</li> <li>Validate and normalize affiliations against our 2019-02 list using reconcile-csv and OpenRefine:
<ul>
<li><code>$ lein run ~/src/git/DSpace/2019-02-22-sponsorships.csv name id</code></li> <li><code>$ lein run ~/src/git/DSpace/2019-02-22-sponsorships.csv name id</code></li>
<li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li> <li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li>
<li>I checked the collection for duplicates and found a few:</li> </ul>
</li>
<li>I checked the collection for duplicates and found a few:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/102513">https://dspacetest.cgiar.org/handle/10568/102513</a> is a duplicate of CIAT item: <a href="https://cgspace.cgiar.org/handle/10568/44158">https://cgspace.cgiar.org/handle/10568/44158</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/102513">https://dspacetest.cgiar.org/handle/10568/102513</a> is a duplicate of CIAT item: <a href="https://cgspace.cgiar.org/handle/10568/44158">https://cgspace.cgiar.org/handle/10568/44158</a></li>
<li><a href="https://dspacetest.cgiar.org/handle/10568/102512">https://dspacetest.cgiar.org/handle/10568/102512</a> is a duplicate of CIAT item: <a href="https://cgspace.cgiar.org/handle/10568/43557">https://cgspace.cgiar.org/handle/10568/43557</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/102512">https://dspacetest.cgiar.org/handle/10568/102512</a> is a duplicate of CIAT item: <a href="https://cgspace.cgiar.org/handle/10568/43557">https://cgspace.cgiar.org/handle/10568/43557</a></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-08-13">2019-08-13</h2> </ul>
</li>
</ul>
<h2 id="20190813">2019-08-13</h2>
<ul> <ul>
<li><p>Create a test user on DSpace Test for Mohammad Salem to attempt depositing:</p> <li>Create a test user on DSpace Test for Mohammad Salem to attempt depositing:</li>
</ul>
<pre><code>$ dspace user -a -m blah@blah.com -g Mohammad -s Salem -p 'domoamaaa' <pre><code>$ dspace user -a -m blah@blah.com -g Mohammad -s Salem -p 'domoamaaa'
</code></pre></li> </code></pre><ul>
<li>Create and merge a pull request (<a href="https://github.com/ilri/DSpace/pull/429">#429</a>) to add eleven new CCAFS Phase II Project Tags to CGSpace</li>
<li><p>Create and merge a pull request (<a href="https://github.com/ilri/DSpace/pull/429">#429</a>) to add eleven new CCAFS Phase II Project Tags to CGSpace</p></li> <li>Atmire responded to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr cores issue</a> last week, but they could not reproduce the issue
<li><p>Atmire responded to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr cores issue</a> last week, but they could not reproduce the issue</p>
<ul> <ul>
<li>I told them not to continue, and that we would keep an eye on it and keep troubleshooting it (if neccessary) in the public eye on dspace-tech and Solr mailing lists</li> <li>I told them not to continue, and that we would keep an eye on it and keep troubleshooting it (if neccessary) in the public eye on dspace-tech and Solr mailing lists</li>
</ul></li> </ul>
</li>
<li><p>Testing an import of 1,429 Bioversity items (metadata only) on my local development machine and got an error with Java memory after about 1,000 items:</p> <li>Testing an import of 1,429 Bioversity items (metadata only) on my local development machine and got an error with Java memory after about 1,000 items:</li>
</ul>
<pre><code>$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com <pre><code>$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
... ...
java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded
</code></pre></li> </code></pre><ul>
<li>I increased the heap size to 1536m and tried again:</li>
<li><p>I increased the heap size to 1536m and tried again:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1536m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1536m&quot;
$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com $ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
</code></pre></li> </code></pre><ul>
<li>This time it succeeded, and using VisualVM I noticed that the import process used a maximum of 620MB of RAM</li>
<li><p>This time it succeeded, and using VisualVM I noticed that the import process used a maximum of 620MB of RAM</p></li> <li>(oops, I realize that actually I forgot to delete items I had flagged as duplicates, so the total should be 1,427 items)</li>
<li><p>(oops, I realize that actually I forgot to delete items I had flagged as duplicates, so the total should be 1,427 items)</p></li>
</ul> </ul>
<h2 id="20190814">2019-08-14</h2>
<h2 id="2019-08-14">2019-08-14</h2>
<ul> <ul>
<li><p>I imported the 1,427 Bioversity records into DSpace Test</p> <li>I imported the 1,427 Bioversity records into DSpace Test
<ul> <ul>
<li>To make sure we didn&rsquo;t have memory issues I reduced Tomcat&rsquo;s JVM heap by 512m, increased the import processes&rsquo;s heap to 512m, and split the input file into two parts with about 700 each</li> <li>To make sure we didn't have memory issues I reduced Tomcat's JVM heap by 512m, increased the import processes's heap to 512m, and split the input file into two parts with about 700 each</li>
<li>Then I had to create a few new temporary collections on DSpace Test that had been created on CGSpace after our last sync</li> <li>Then I had to create a few new temporary collections on DSpace Test that had been created on CGSpace after our last sync</li>
<li>After that the import succeeded:</li>
<li><p>After that the import succeeded:</p> </ul>
</li>
</ul>
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m' <pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m'
$ dspace metadata-import -f /tmp/bioversity1.csv -e blah@blah.com $ dspace metadata-import -f /tmp/bioversity1.csv -e blah@blah.com
$ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com $ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
</code></pre></li> </code></pre><ul>
</ul></li> <li>The next step is to check these items for duplicates</li>
<li><p>The next step is to check these items for duplicates</p></li>
</ul> </ul>
<h2 id="20190816">2019-08-16</h2>
<h2 id="2019-08-16">2019-08-16</h2>
<ul> <ul>
<li>Email Bioversity to let them know that the 1,427 records are on DSpace Test and that Abenet should look over them</li> <li>Email Bioversity to let them know that the 1,427 records are on DSpace Test and that Abenet should look over them</li>
</ul> </ul>
<h2 id="20190818">2019-08-18</h2>
<h2 id="2019-08-18">2019-08-18</h2>
<ul> <ul>
<li>Deploy latest <code>5_x-prod</code> branch on CGSpace (linode18), including the <a href="https://github.com/ilri/DSpace/pull/429">new CCAFS project tags</a></li> <li>Deploy latest <code>5_x-prod</code> branch on CGSpace (linode18), including the <a href="https://github.com/ilri/DSpace/pull/429">new CCAFS project tags</a></li>
<li>Deploy Tomcat 7.0.96 and PostgreSQL JDBC 42.2.6 driver on CGSpace (linde18)</li> <li>Deploy Tomcat 7.0.96 and PostgreSQL JDBC 42.2.6 driver on CGSpace (linde18)</li>
<li>After restarting Tomcat one of the Solr statistics cores failed to start up:</li>
<li><p>After restarting Tomcat one of the Solr statistics cores failed to start up:</p>
<pre><code>statistics-2015: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
</code></pre></li>
<li><p>I decided to run all system updates on the server and reboot it</p></li>
<li><p>After reboot the statistics-2018 core failed to load so I restarted <code>tomcat7</code> again</p></li>
<li><p>After this last restart all Solr cores seem to be up and running</p></li>
</ul> </ul>
<pre><code>statistics-2015: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
<h2 id="2019-08-20">2019-08-20</h2> </code></pre><ul>
<li>I decided to run all system updates on the server and reboot it</li>
<li>After reboot the statistics-2018 core failed to load so I restarted <code>tomcat7</code> again</li>
<li>After this last restart all Solr cores seem to be up and running</li>
</ul>
<h2 id="20190820">2019-08-20</h2>
<ul> <ul>
<li><p>Francesco sent me a new CSV with the raw filenames and paths for the Bioversity migration</p> <li>Francesco sent me a new CSV with the raw filenames and paths for the Bioversity migration
<ul> <ul>
<li>All file paths are relative to the Typo3 upload path of <code>/fileadmin</code> on the Bioversity website</li> <li>All file paths are relative to the Typo3 upload path of <code>/fileadmin</code> on the Bioversity website</li>
<li>I create a new column with the derived URL that I can use to download the PDFs with my <code>generate-thumbnails.py</code> script</li> <li>I create a new column with the derived URL that I can use to download the PDFs with my <code>generate-thumbnails.py</code> script</li>
<li>Unfortunately now the filename column has paths too, so I have to use a simple Python/Jython script in OpenRefine to get the basename of the files in the filename column:</li>
<li><p>Unfortunately now the filename column has paths too, so I have to use a simple Python/Jython script in OpenRefine to get the basename of the files in the filename column:</p> </ul>
</li>
</ul>
<pre><code>import os <pre><code>import os
return os.path.basename(value) return os.path.basename(value)
</code></pre></li> </code></pre><ul>
</ul></li> <li>Then I can try to download all the files again with the script</li>
<li>I also asked Francesco about the strange filenames (.LCK, .zip, and .7z)</li>
<li><p>Then I can try to download all the files again with the script</p></li>
<li><p>I also asked Francesco about the strange filenames (.LCK, .zip, and .7z)</p></li>
</ul> </ul>
<h2 id="20190821">2019-08-21</h2>
<h2 id="2019-08-21">2019-08-21</h2>
<ul> <ul>
<li>Upload <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality repository to ILRI&rsquo;s GitHub organization</a></li> <li>Upload <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality repository to ILRI's GitHub organization</a></li>
<li>Fix a few invalid countries in IITA&rsquo;s <a href="https://dspacetest.cgiar.org/handle/10568/102361">July 29</a> records (aka &ldquo;20195TH.xls&rdquo;) <li>Fix a few invalid countries in IITA's <a href="https://dspacetest.cgiar.org/handle/10568/102361">July 29</a> records (aka &ldquo;20195TH.xls&rdquo;)
<ul> <ul>
<li>These were not caught by my csv-metadata-quality check script because of a logic error</li> <li>These were not caught by my csv-metadata-quality check script because of a logic error</li>
<li>Remove <code>dc.identified.uri</code> fields from test data, set <code>id</code> values to &ldquo;-1&rdquo;, add collection mappings according to <code>dc.type</code>, and Upload 126 IITA records to CGSpace</li> <li>Remove <code>dc.identified.uri</code> fields from test data, set <code>id</code> values to &ldquo;-1&rdquo;, add collection mappings according to <code>dc.type</code>, and Upload 126 IITA records to CGSpace</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-08-22">2019-08-22</h2> </ul>
<h2 id="20190822">2019-08-22</h2>
<ul> <ul>
<li>Transfer original <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> repository to ILRI organization on GitHub</li> <li>Transfer original <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> repository to ILRI organization on GitHub</li>
</ul> </ul>
<h2 id="20190823">2019-08-23</h2>
<h2 id="2019-08-23">2019-08-23</h2>
<ul> <ul>
<li>Run system updates on AReS / OpenRXV dev server (linode20) and reboot it</li> <li>Run system updates on AReS / OpenRXV dev server (linode20) and reboot it</li>
<li>Fix AReS exports on DSpace Test by adding a new nginx proxy pass</li> <li>Fix AReS exports on DSpace Test by adding a new nginx proxy pass</li>
</ul> </ul>
<h2 id="20190826">2019-08-26</h2>
<h2 id="2019-08-26">2019-08-26</h2>
<ul> <ul>
<li><p>Peter sent 2,943 corrections to the author dump I had originally sent him on 2019-05-27</p> <li>Peter sent 2,943 corrections to the author dump I had originally sent him on 2019-05-27
<ul> <ul>
<li>I noticed that one correction had a missing space after the comma, ie &ldquo;Adamou,A.&rdquo; so I corrected it</li> <li>I noticed that one correction had a missing space after the comma, ie &ldquo;Adamou,A.&rdquo; so I corrected it</li>
<li>Also, I should add that as a check to the csv-metadata-quality pipeline</li> <li>Also, I should add that as a check to the csv-metadata-quality pipeline</li>
<li>Apply the corrections to my local dev machine in preparation for the CGSpace:</li>
<li><p>Apply the corrections to my local dev machine in preparation for the CGSpace:</p> </ul>
</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i ~/Downloads/2019-08-26-Peter-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correct <pre><code>$ ./fix-metadata-values.py -i ~/Downloads/2019-08-26-Peter-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correct
</code></pre></li> </code></pre><ul>
</ul></li> <li>Apply the corrections on CGSpace and DSpace Test
<li><p>Apply the corrections on CGSpace and DSpace Test</p>
<ul> <ul>
<li><p>After that I started a full Discovery re-indexing on both servers:</p> <li>After that I started a full Discovery re-indexing on both servers:</li>
</ul>
</li>
</ul>
<pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b <pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 81m47.057s real 81m47.057s
user 8m5.265s user 8m5.265s
sys 2m24.715s sys 2m24.715s
</code></pre></li> </code></pre><ul>
</ul></li> <li>
<p>Peter asked me to add related citation aka <code>cg.link.citation</code> to the item view</p>
<li><p>Peter asked me to add related citation aka <code>cg.link.citation</code> to the item view</p>
<ul> <ul>
<li>I created a <a href="https://github.com/ilri/DSpace/pull/430">pull request</a> with a draft implementation and asked for Peter&rsquo;s feedback</li> <li>I created a <a href="https://github.com/ilri/DSpace/pull/430">pull request</a> with a draft implementation and asked for Peter's feedback</li>
</ul></li>
<li><p>Add the ability to skip certain fields from the csv-metadata-quality script using <code>--exclude-fields</code></p>
<ul>
<li>For example, when I&rsquo;m working on the author corrections I want to do the basic checks on the corrected fields, but on the original fields so I would use <code>--exclude-fields dc.contributor.author</code> for example</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-08-27">2019-08-27</h2> <li>
<p>Add the ability to skip certain fields from the csv-metadata-quality script using <code>--exclude-fields</code></p>
<ul>
<li>For example, when I'm working on the author corrections I want to do the basic checks on the corrected fields, but on the original fields so I would use <code>--exclude-fields dc.contributor.author</code> for example</li>
</ul>
</li>
</ul>
<h2 id="20190827">2019-08-27</h2>
<ul> <ul>
<li>File <a href="https://github.com/ilri/OpenRXV/issues/11">an issue on OpenRXV</a> for the bug when selecting communities</li> <li>File <a href="https://github.com/ilri/OpenRXV/issues/11">an issue on OpenRXV</a> for the bug when selecting communities</li>
<li>Peter approved the related citation changes so I merged the <a href="https://github.com/ilri/DSpace/pull/430">pull request on GitHub</a> and will deploy it to CGSpace this weekend</li> <li>Peter approved the related citation changes so I merged the <a href="https://github.com/ilri/DSpace/pull/430">pull request on GitHub</a> and will deploy it to CGSpace this weekend</li>
<li>Add a safety feature to <code>fix-metadata-values.py</code> that skips correction values that contain the &lsquo;|&rsquo; character</li> <li>Add a safety feature to <code>fix-metadata-values.py</code> that skips correction values that contain the &lsquo;|&rsquo; character</li>
<li>Help Francesco from Bioversity with the REST and OAI APIs on CGSpace <li>Help Francesco from Bioversity with the REST and OAI APIs on CGSpace
<ul> <ul>
<li>He is contracted by Bioversity to work on the migration from Typo3</li> <li>He is contracted by Bioversity to work on the migration from Typo3</li>
<li>I told him that the OAI interface only exposes Dublin Core fields in its default configuration and that he might want to use OAI to get the latest-changed items, then use REST API to get their metadata</li> <li>I told him that the OAI interface only exposes Dublin Core fields in its default configuration and that he might want to use OAI to get the latest-changed items, then use REST API to get their metadata</li>
</ul></li> </ul>
</li>
<li>Add a fix for missing space after commas to my <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.2</li> <li>Add a fix for missing space after commas to my <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.2</li>
</ul> </ul>
<h2 id="20190828">2019-08-28</h2>
<h2 id="2019-08-28">2019-08-28</h2>
<ul> <ul>
<li>Skype with Jane about AReS Phase III priorities</li> <li>Skype with Jane about AReS Phase III priorities</li>
<li>I did a test to automatically fix some authors in the database using my csv-metadata-quality script
<li><p>I did a test to automatically fix some authors in the database using my csv-metadata-quality script</p>
<ul> <ul>
<li><p>First I dumped a list of all unique authors:</p> <li>First I dumped a list of all unique authors:</li>
</ul>
</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-08-28-all-authors.csv with csv header; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-08-28-all-authors.csv with csv header;
COPY 65597 COPY 65597
</code></pre></li> </code></pre><ul>
</ul></li> <li>Then I created a new CSV with two author columns (edit title of second column after):</li>
<li><p>Then I created a new CSV with two author columns (edit title of second column after):</p>
<pre><code>$ csvcut -c dc.contributor.author,dc.contributor.author /tmp/2019-08-28-all-authors.csv &gt; /tmp/all-authors.csv
</code></pre></li>
<li><p>Then I ran my script on the new CSV, skipping one of the author columns:</p>
<pre><code>$ csv-metadata-quality -u -i /tmp/all-authors.csv -o /tmp/authors.csv -x dc.contributor.author
</code></pre></li>
<li><p>This fixed a bunch of issues with spaces, commas, unneccesary Unicode characters, etc</p></li>
<li><p>Then I ran the corrections on my test server and there were 185 of them!</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correctauthor
</code></pre></li>
<li><p>I very well might run these on CGSpace soon&hellip;</p></li>
</ul> </ul>
<pre><code>$ csvcut -c dc.contributor.author,dc.contributor.author /tmp/2019-08-28-all-authors.csv &gt; /tmp/all-authors.csv
<h2 id="2019-08-29">2019-08-29</h2> </code></pre><ul>
<li>Then I ran my script on the new CSV, skipping one of the author columns:</li>
</ul>
<pre><code>$ csv-metadata-quality -u -i /tmp/all-authors.csv -o /tmp/authors.csv -x dc.contributor.author
</code></pre><ul>
<li>This fixed a bunch of issues with spaces, commas, unneccesary Unicode characters, etc</li>
<li>Then I ran the corrections on my test server and there were 185 of them!</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correctauthor
</code></pre><ul>
<li>I very well might run these on CGSpace soon&hellip;</li>
</ul>
<h2 id="20190829">2019-08-29</h2>
<ul> <ul>
<li><p>Resume working on the CG Core v2 changes in the <code>5_x-cgcorev2</code> branch again</p> <li>Resume working on the CG Core v2 changes in the <code>5_x-cgcorev2</code> branch again
<ul> <ul>
<li>I notice that CG Core doesn&rsquo;t currently have a field for CGSpace&rsquo;s &ldquo;alternative title&rdquo; (<code>dc.title.alternative</code>), but DCTERMS has <code>dcterms.alternative</code> so I <a href="https://github.com/AgriculturalSemantics/cg-core/issues/9">raised an issue about adding it</a></li> <li>I notice that CG Core doesn't currently have a field for CGSpace's &ldquo;alternative title&rdquo; (<code>dc.title.alternative</code>), but DCTERMS has <code>dcterms.alternative</code> so I <a href="https://github.com/AgriculturalSemantics/cg-core/issues/9">raised an issue about adding it</a></li>
<li>Marie responded and said she would add <code>dcterms.alternative</code></li> <li>Marie responded and said she would add <code>dcterms.alternative</code></li>
<li>I created a sed script file to perform some replacements of metadata on the XMLUI XSL files:</li>
<li><p>I created a sed script file to perform some replacements of metadata on the XMLUI XSL files:</p> </ul>
</li>
</ul>
<pre><code>$ find dspace/modules/xmlui-mirage2/src/main/webapp/themes -iname &quot;*.xsl&quot; -exec ./cgcore-xsl-replacements.sed {} \; <pre><code>$ find dspace/modules/xmlui-mirage2/src/main/webapp/themes -iname &quot;*.xsl&quot; -exec ./cgcore-xsl-replacements.sed {} \;
</code></pre></li> </code></pre><ul>
</ul></li> <li>I think I got everything in the XMLUI themes, but there may be some things I should check once I get a deployment up and running:
<li><p>I think I got everything in the XMLUI themes, but there may be some things I should check once I get a deployment up and running:</p>
<ul> <ul>
<li>Need to assess the XSL changes to see if things like <code>not(@qualifier)]</code> still make sense after we move fields from DC to DCTERMS, as some fields will no longer have qualifiers</li> <li>Need to assess the XSL changes to see if things like <code>not(@qualifier)]</code> still make sense after we move fields from DC to DCTERMS, as some fields will no longer have qualifiers</li>
<li>Do I need to edit the author links to remove <code>dc.contributor.author</code> in <code>0_CGIAR/xsl/aspect/artifactbrowser/item-list-alterations.xsl</code>?</li> <li>Do I need to edit the author links to remove <code>dc.contributor.author</code> in <code>0_CGIAR/xsl/aspect/artifactbrowser/item-list-alterations.xsl</code>?</li>
<li>Do I need to edit the author links to remove <code>dc.contributor.author</code> in <code>0_CGIAR/xsl/aspect/discovery/discovery-item-list-alterations.xsl</code>?</li> <li>Do I need to edit the author links to remove <code>dc.contributor.author</code> in <code>0_CGIAR/xsl/aspect/discovery/discovery-item-list-alterations.xsl</code>?</li>
</ul></li> </ul>
</li>
<li><p>Thierry Lewadle asked why some PDFs on CGSpace open in the browser and some download</p> <li>Thierry Lewadle asked why some PDFs on CGSpace open in the browser and some download
<ul> <ul>
<li>I told him it is because of the &ldquo;content disposition&rdquo; that causes DSpace to tell the browser to open or download the file based on its file size (currently around 8 megabytes)</li> <li>I told him it is because of the &ldquo;content disposition&rdquo; that causes DSpace to tell the browser to open or download the file based on its file size (currently around 8 megabytes)</li>
</ul></li>
<li><p>Peter asked why <a href="https://hdl.handle.net/10568/97825">an item on CGSpace</a> has no Altmetric donut on the item view, but has one in our explorer</p>
<ul>
<li><p>I looked in the network requests when loading the CGSpace item view and I see the following response to the Altmetric API call:</p>
<pre><code>&quot;handles&quot;:[&quot;10986/30568&quot;,&quot;10568/97825&quot;],&quot;handle&quot;:&quot;10986/30568&quot;
</code></pre></li>
</ul></li>
<li><p>So this is the same issue we had before, where Altmetric <em>knows</em> this Handle is associated with a DOI that has a score, but the client-side JavaScript code doesn&rsquo;t show it because it seems to a secondary handle or something</p></li>
</ul> </ul>
</li>
<h2 id="2019-08-31">2019-08-31</h2> <li>Peter asked why <a href="https://hdl.handle.net/10568/97825">an item on CGSpace</a> has no Altmetric donut on the item view, but has one in our explorer
<ul>
<li>I looked in the network requests when loading the CGSpace item view and I see the following response to the Altmetric API call:</li>
</ul>
</li>
</ul>
<pre><code>&quot;handles&quot;:[&quot;10986/30568&quot;,&quot;10568/97825&quot;],&quot;handle&quot;:&quot;10986/30568&quot;
</code></pre><ul>
<li>So this is the same issue we had before, where Altmetric <em>knows</em> this Handle is associated with a DOI that has a score, but the client-side JavaScript code doesn't show it because it seems to a secondary handle or something</li>
</ul>
<h2 id="20190831">2019-08-31</h2>
<ul> <ul>
<li>Run system updates on DSpace Test (linode19) and reboot the server</li> <li>Run system updates on DSpace Test (linode19) and reboot the server</li>
<li>Run the author fixes on DSpace Test and CGSpace and start a full Discovery re-index:</li>
<li><p>Run the author fixes on DSpace Test and CGSpace and start a full Discovery re-index:</p> </ul>
<pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b <pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 90m47.967s real 90m47.967s
user 8m12.826s user 8m12.826s
sys 2m27.496s sys 2m27.496s
</code></pre></li> </code></pre><ul>
<li>I set up a test environment for CG Core v2 on my local environment and ran all the field migrations
<li><p>I set up a test environment for CG Core v2 on my local environment and ran all the field migrations</p>
<ul> <ul>
<li>DSpace comes up and runs, but there are some graphical issues, like missing community names</li> <li>DSpace comes up and runs, but there are some graphical issues, like missing community names</li>
<li>It turns out that my sed script was replacing some XSL code that was responsible for printing community names</li> <li>It turns out that my sed script was replacing some XSL code that was responsible for printing community names</li>
@ -604,10 +541,10 @@ sys 2m27.496s
<li>After reading the code I see that XSLT is reading the community titles from the DIM representation (stored in the <code>$dim</code> variable) created from METS</li> <li>After reading the code I see that XSLT is reading the community titles from the DIM representation (stored in the <code>$dim</code> variable) created from METS</li>
<li>I modified the patterns in my sed script so that those lines are not replaced and then the community list works again</li> <li>I modified the patterns in my sed script so that those lines are not replaced and then the community list works again</li>
<li>This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata</li> <li>This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata</li>
</ul></li>
</ul> </ul>
</li>
<!-- vim: set sw=2 ts=2: --> </ul>
<!-- raw HTML omitted -->

View File

@ -8,34 +8,31 @@
<meta property="og:title" content="September, 2019" /> <meta property="og:title" content="September, 2019" />
<meta property="og:description" content="2019-09-01 <meta property="og:description" content="2019-09-01
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255 440 17.58.101.255
441 157.55.39.101 441 157.55.39.101
485 207.46.13.43 485 207.46.13.43
728 169.60.128.125 728 169.60.128.125
730 207.46.13.108 730 207.46.13.108
758 157.55.39.9 758 157.55.39.9
808 66.160.140.179 808 66.160.140.179
814 207.46.13.212 814 207.46.13.212
2472 163.172.71.23 2472 163.172.71.23
6092 3.94.211.189 6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb 33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124 57 3.83.192.124
57 3.87.77.25 57 3.87.77.25
57 54.82.1.8 57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2 822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72 1223 45.5.184.72
1633 172.104.229.92 1633 172.104.229.92
5112 205.186.128.185 5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396 7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2 9124 45.5.186.2
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
@ -46,36 +43,33 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
<meta name="twitter:title" content="September, 2019"/> <meta name="twitter:title" content="September, 2019"/>
<meta name="twitter:description" content="2019-09-01 <meta name="twitter:description" content="2019-09-01
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255 440 17.58.101.255
441 157.55.39.101 441 157.55.39.101
485 207.46.13.43 485 207.46.13.43
728 169.60.128.125 728 169.60.128.125
730 207.46.13.108 730 207.46.13.108
758 157.55.39.9 758 157.55.39.9
808 66.160.140.179 808 66.160.140.179
814 207.46.13.212 814 207.46.13.212
2472 163.172.71.23 2472 163.172.71.23
6092 3.94.211.189 6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb 33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124 57 3.83.192.124
57 3.87.77.25 57 3.87.77.25
57 54.82.1.8 57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2 822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72 1223 45.5.184.72
1633 172.104.229.92 1633 172.104.229.92
5112 205.186.128.185 5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396 7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2 9124 45.5.186.2
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -156,158 +150,136 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
</p> </p>
</header> </header>
<h2 id="2019-09-01">2019-09-01</h2> <h2 id="20190901">2019-09-01</h2>
<ul> <ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li> <li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255 440 17.58.101.255
441 157.55.39.101 441 157.55.39.101
485 207.46.13.43 485 207.46.13.43
728 169.60.128.125 728 169.60.128.125
730 207.46.13.108 730 207.46.13.108
758 157.55.39.9 758 157.55.39.9
808 66.160.140.179 808 66.160.140.179
814 207.46.13.212 814 207.46.13.212
2472 163.172.71.23 2472 163.172.71.23
6092 3.94.211.189 6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb 33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124 57 3.83.192.124
57 3.87.77.25 57 3.87.77.25
57 54.82.1.8 57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2 822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72 1223 45.5.184.72
1633 172.104.229.92 1633 172.104.229.92
5112 205.186.128.185 5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396 7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2 9124 45.5.186.2
</code></pre></li> </code></pre><ul>
</ul>
<ul>
<li><code>3.94.211.189</code> is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503</li> <li><code>3.94.211.189</code> is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503</li>
<li><code>163.172.71.23</code> is some IP on Online SAS in France and its user agent is:</li>
<li><p><code>163.172.71.23</code> is some IP on Online SAS in France and its user agent is:</p>
<pre><code>Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
</code></pre></li>
<li><p>It actually got mostly HTTP 200 responses:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
1775 200
703 499
72 503
</code></pre></li>
<li><p>And it was mostly requesting Discover pages:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | grep 163.172.71.23 | grep -o -E &quot;(bitstream|discover|handle)&quot; | sort | uniq -c
2350 discover
71 handle
</code></pre></li>
<li><p>I&rsquo;m not sure why the outbound traffic rate was so high&hellip;</p></li>
</ul> </ul>
<pre><code>Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
<h2 id="2019-09-02">2019-09-02</h2> </code></pre><ul>
<li>It actually got mostly HTTP 200 responses:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
1775 200
703 499
72 503
</code></pre><ul>
<li>And it was mostly requesting Discover pages:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | grep 163.172.71.23 | grep -o -E &quot;(bitstream|discover|handle)&quot; | sort | uniq -c
2350 discover
71 handle
</code></pre><ul>
<li>I'm not sure why the outbound traffic rate was so high&hellip;</li>
</ul>
<h2 id="20190902">2019-09-02</h2>
<ul> <ul>
<li>Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August <li>Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August
<ul> <ul>
<li>I told them to check the <a href="https://dspacetest.cgiar.org/handle/10568/103999">temporary collection on DSpace Test</a> where I uploaded the 1,427 items so they can see how it will look</li> <li>I told them to check the <a href="https://dspacetest.cgiar.org/handle/10568/103999">temporary collection on DSpace Test</a> where I uploaded the 1,427 items so they can see how it will look</li>
<li>Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck)</li> <li>Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck)</li>
<li>Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification</li> <li>Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-10">2019-09-10</h2> </ul>
<h2 id="20190910">2019-09-10</h2>
<ul> <ul>
<li>Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges! <li>Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges!
<ul> <ul>
<li>See: <a href="https://hdl.handle.net/handle/10568/97825">https://hdl.handle.net/handle/10568/97825</a></li> <li>See: <a href="https://hdl.handle.net/handle/10568/97825">https://hdl.handle.net/handle/10568/97825</a></li>
</ul></li> </ul>
</li>
<li>Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807.xsl) <li>Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807.xsl)
<ul> <ul>
<li>These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out</li> <li>These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out</li>
<li>It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the <a href="https://cgspace.cgiar.org/handle/10568/68616/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018-12-19&amp;submit_apply_filter=&amp;query=">accession date as a filter</a> to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file)</li> <li>It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the <a href="https://cgspace.cgiar.org/handle/10568/68616/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018-12-19&amp;submit_apply_filter=&amp;query=">accession date as a filter</a> to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file)</li>
<li>Now I just checked a few manually and they are correct in the original input file, so something must have happened when Sisay was processing them for upload</li> <li>Now I just checked a few manually and they are correct in the original input file, so something must have happened when Sisay was processing them for upload</li>
<li>I have asked Sisay to fix them&hellip;</li> <li>I have asked Sisay to fix them&hellip;</li>
</ul></li> </ul>
</li>
<li>Continue working on CG Core v2 migration, focusing on the crosswalk mappings <li>Continue working on CG Core v2 migration, focusing on the crosswalk mappings
<ul> <ul>
<li>I think we can skip the MODS crosswalk for now because it is only used in <a href="https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema">AIP exports that are meant for non-DSpace systems</a></li> <li>I think we can skip the MODS crosswalk for now because it is only used in <a href="https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema">AIP exports that are meant for non-DSpace systems</a></li>
<li>We should probably do the QDC crosswalk as well as those in <code>xhtml-head-item.properties</code>&hellip;</li> <li>We should probably do the QDC crosswalk as well as those in <code>xhtml-head-item.properties</code>&hellip;</li>
<li>Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see <code>dspace/config/crosswalks/oai/*.xsl</code>)</li> <li>Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see <code>dspace/config/crosswalks/oai/*.xsl</code>)</li>
<li>In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc</li> <li>In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-11">2019-09-11</h2> </ul>
<h2 id="20190911">2019-09-11</h2>
<ul> <ul>
<li>Maria Garruccio asked me to add two new Bioversity ORCID identifiers to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/431">pull request</a></li> <li>Maria Garruccio asked me to add two new Bioversity ORCID identifiers to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/431">pull request</a></li>
<li>Marissa Van Epp asked me to add new CCAFS Phase II project tags to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/432">pull request</a> <li>Marissa Van Epp asked me to add new CCAFS Phase II project tags to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/432">pull request</a>
<ul> <ul>
<li>I will wait until I hear from her to merge it because there is one tag that seems to be a duplicate because its name (PII-WA_agrosylvopast) is similar to one that already exists (PII-WA_AgroSylvopastoralSystems)</li> <li>I will wait until I hear from her to merge it because there is one tag that seems to be a duplicate because its name (PII-WA_agrosylvopast) is similar to one that already exists (PII-WA_AgroSylvopastoralSystems)</li>
</ul></li> </ul>
</li>
<li>More work on the CG Core v2 migrations <li>More work on the CG Core v2 migrations
<ul> <ul>
<li>I have updated my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">notes on the possible changes</a> and done more work on the XMLUI replacements</li> <li>I have updated my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">notes on the possible changes</a> and done more work on the XMLUI replacements</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-12">2019-09-12</h2> </ul>
<h2 id="20190912">2019-09-12</h2>
<ul> <ul>
<li>Deploy <a href="https://jdbc.postgresql.org/">PostgreSQL JDBC driver</a> version 42.2.7 on DSpace Test and update the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li> <li>Deploy <a href="https://jdbc.postgresql.org/">PostgreSQL JDBC driver</a> version 42.2.7 on DSpace Test and update the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li>
</ul> </ul>
<h2 id="20190915">2019-09-15</h2>
<h2 id="2019-09-15">2019-09-15</h2>
<ul> <ul>
<li>Deploy Bioversity ORCID identifier updates to CGSpace</li> <li>Deploy Bioversity ORCID identifier updates to CGSpace</li>
<li>Deploy PostgreSQL JDBC driver 42.2.7 on CGSpace</li> <li>Deploy PostgreSQL JDBC driver 42.2.7 on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and restart the server <li>Run system updates on CGSpace (linode18) and restart the server
<ul> <ul>
<li>After restarting the system Tomcat came back up, but not all Solr statistics cores were loaded</li> <li>After restarting the system Tomcat came back up, but not all Solr statistics cores were loaded</li>
<li>I had to restart Tomcat one more time until the cores were loaded (verified in the Solr admin)</li> <li>I had to restart Tomcat one more time until the cores were loaded (verified in the Solr admin)</li>
</ul></li> </ul>
</li>
<li>Update nginx TLS cipher suite to the latest <a href="https://ssl-config.mozilla.org/#server=nginx&amp;server-version=1.16.1&amp;config=intermediate&amp;openssl-version=1.0.2g">Mozilla intermediate recommendations for nginx 1.16.0 and openssl 1.0.2</a> <li>Update nginx TLS cipher suite to the latest <a href="https://ssl-config.mozilla.org/#server=nginx&amp;server-version=1.16.1&amp;config=intermediate&amp;openssl-version=1.0.2g">Mozilla intermediate recommendations for nginx 1.16.0 and openssl 1.0.2</a>
<ul> <ul>
<li>DSpace Test (linode19) is running Ubuntu 18.04 with nginx 1.17.x and openssl 1.1.1 so it can even use TLS v1.3 if we override the nginx ssl protocol in its host vars</li> <li>DSpace Test (linode19) is running Ubuntu 18.04 with nginx 1.17.x and openssl 1.1.1 so it can even use TLS v1.3 if we override the nginx ssl protocol in its host vars</li>
</ul></li> </ul>
</li>
<li><p>XMLUI item view pages are blank on CGSpace right now</p> <li>XMLUI item view pages are blank on CGSpace right now
<ul> <ul>
<li><p>Like earliert this year, I see the following error in the Cocoon log while browsing:</p> <li>Like earliert this year, I see the following error in the Cocoon log while browsing:</li>
</ul>
</li>
</ul>
<pre><code>2019-09-15 15:32:18,137 WARN org.apache.cocoon.components.xslt.TraxErrorListener - Can not load requested doc: unknown protocol: cocoon at jndi:/localhost/themes/CIAT/xsl/../../0_CGIAR/xsl//aspect/artifactbrowser/common.xsl:141:90 <pre><code>2019-09-15 15:32:18,137 WARN org.apache.cocoon.components.xslt.TraxErrorListener - Can not load requested doc: unknown protocol: cocoon at jndi:/localhost/themes/CIAT/xsl/../../0_CGIAR/xsl//aspect/artifactbrowser/common.xsl:141:90
</code></pre></li> </code></pre><ul>
</ul></li> <li>Around the same time I see the following in the DSpace log:</li>
</ul>
<li><p>Around the same time I see the following in the DSpace log:</p>
<pre><code>2019-09-15 15:32:18,079 INFO org.dspace.usage.LoggerUsageEventListener @ aorth@blah:session_id=A11C362A7127004C24E77198AF9E4418:ip_addr=x.x.x.x:view_item:handle=10568/103644 <pre><code>2019-09-15 15:32:18,079 INFO org.dspace.usage.LoggerUsageEventListener @ aorth@blah:session_id=A11C362A7127004C24E77198AF9E4418:ip_addr=x.x.x.x:view_item:handle=10568/103644
2019-09-15 15:32:18,135 WARN org.dspace.core.PluginManager @ Cannot find named plugin for interface=org.dspace.content.crosswalk.DisseminationCrosswalk, name=&quot;METSRIGHTS&quot; 2019-09-15 15:32:18,135 WARN org.dspace.core.PluginManager @ Cannot find named plugin for interface=org.dspace.content.crosswalk.DisseminationCrosswalk, name=&quot;METSRIGHTS&quot;
</code></pre></li> </code></pre><ul>
<li>I see a lot of these errors today, but not earlier this month:</li>
<li><p>I see a lot of these errors today, but not earlier this month:</p> </ul>
<pre><code># grep -c 'Cannot find named plugin' dspace.log.2019-09-* <pre><code># grep -c 'Cannot find named plugin' dspace.log.2019-09-*
dspace.log.2019-09-01:0 dspace.log.2019-09-01:0
dspace.log.2019-09-02:0 dspace.log.2019-09-02:0
@ -324,27 +296,23 @@ dspace.log.2019-09-12:0
dspace.log.2019-09-13:0 dspace.log.2019-09-13:0
dspace.log.2019-09-14:0 dspace.log.2019-09-14:0
dspace.log.2019-09-15:808 dspace.log.2019-09-15:808
</code></pre></li> </code></pre><ul>
<li>Something must have happened when I restarted Tomcat a few hours ago, because earlier in the DSpace log I see a bunch of errors like this:</li>
<li><p>Something must have happened when I restarted Tomcat a few hours ago, because earlier in the DSpace log I see a bunch of errors like this:</p> </ul>
<pre><code>2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.METSRightsCrosswalk&quot;, name=&quot;METSRIGHTS&quot; <pre><code>2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.METSRightsCrosswalk&quot;, name=&quot;METSRIGHTS&quot;
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.OREDisseminationCrosswalk&quot;, name=&quot;ore&quot; 2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.OREDisseminationCrosswalk&quot;, name=&quot;ore&quot;
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.DIMDisseminationCrosswalk&quot;, name=&quot;dim&quot; 2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.DIMDisseminationCrosswalk&quot;, name=&quot;dim&quot;
</code></pre></li> </code></pre><ul>
<li>I restarted Tomcat and the item views came back, but then the Solr statistics cores didn't all load properly
<li><p>I restarted Tomcat and the item views came back, but then the Solr statistics cores didn&rsquo;t all load properly</p>
<ul> <ul>
<li>After restarting Tomcat once again, both the item views and the Solr statistics cores all came back OK</li> <li>After restarting Tomcat once again, both the item views and the Solr statistics cores all came back OK</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-19">2019-09-19</h2> </ul>
<h2 id="20190919">2019-09-19</h2>
<ul> <ul>
<li><p>For some reason my podman PostgreSQL container isn&rsquo;t working so I had to use Docker to re-create it for my testing work today:</p> <li>For some reason my podman PostgreSQL container isn't working so I had to use Docker to re-create it for my testing work today:</li>
</ul>
<pre><code># docker pull docker.io/library/postgres:9.6-alpine <pre><code># docker pull docker.io/library/postgres:9.6-alpine
# docker create volume dspacedb_data # docker create volume dspacedb_data
# docker run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine # docker run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
@ -354,15 +322,14 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-08-31.backup $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-08-31.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
</code></pre></li> </code></pre><ul>
<li>Elizabeth from CIAT sent me a list of sixteen authors who need to have their ORCID identifiers tagged with their publications
<li><p>Elizabeth from CIAT sent me a list of sixteen authors who need to have their ORCID identifiers tagged with their publications</p>
<ul> <ul>
<li>I manually checked the ORCID profile links to make sure they matched the names</li> <li>I manually checked the ORCID profile links to make sure they matched the names</li>
<li>Then I created an input file to use with my <code>add-orcid-identifiers-csv.py</code> script:</li>
<li><p>Then I created an input file to use with my <code>add-orcid-identifiers-csv.py</code> script:</p> </ul>
</li>
</ul>
<pre><code>dc.contributor.author,cg.creator.id <pre><code>dc.contributor.author,cg.creator.id
&quot;Kihara, Job&quot;,&quot;Job Kihara: 0000-0002-4394-9553&quot; &quot;Kihara, Job&quot;,&quot;Job Kihara: 0000-0002-4394-9553&quot;
&quot;Twyman, Jennifer&quot;,&quot;Jennifer Twyman: 0000-0002-8581-5668&quot; &quot;Twyman, Jennifer&quot;,&quot;Jennifer Twyman: 0000-0002-8581-5668&quot;
@ -380,242 +347,212 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
&quot;Tamene, Lulseged&quot;,&quot;Lulseged Tamene: 0000-0002-3806-8890&quot; &quot;Tamene, Lulseged&quot;,&quot;Lulseged Tamene: 0000-0002-3806-8890&quot;
&quot;Andrieu, Nadine&quot;,&quot;Nadine Andrieu: 0000-0001-9558-9302&quot; &quot;Andrieu, Nadine&quot;,&quot;Nadine Andrieu: 0000-0001-9558-9302&quot;
&quot;Ramírez-Villegas, Julián&quot;,&quot;Julian Ramirez-Villegas: 0000-0002-8044-583X&quot; &quot;Ramírez-Villegas, Julián&quot;,&quot;Julian Ramirez-Villegas: 0000-0002-8044-583X&quot;
</code></pre></li> </code></pre><ul>
</ul></li> <li>I tested the file on my local development machine with the following invocation:</li>
</ul>
<li><p>I tested the file on my local development machine with the following invocation:</p>
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2019-09-19-ciat-orcids.csv -db dspace -u dspace -p 'fuuu' <pre><code>$ ./add-orcid-identifiers-csv.py -i 2019-09-19-ciat-orcids.csv -db dspace -u dspace -p 'fuuu'
</code></pre></li> </code></pre><ul>
<li>In my test environment this added 390 ORCID identifier</li>
<li><p>In my test environment this added 390 ORCID identifier</p></li> <li>I ran the same updates on CGSpace and DSpace Test and then started a Discovery re-index to force the search index to update</li>
<li>Update the PostgreSQL JDBC driver to version 42.2.8 in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a>
<li><p>I ran the same updates on CGSpace and DSpace Test and then started a Discovery re-index to force the search index to update</p></li>
<li><p>Update the PostgreSQL JDBC driver to version 42.2.8 in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></p>
<ul> <ul>
<li>There is only <a href="https://github.com/pgjdbc/pgjdbc/issues/1567">one minor fix to a usecase we aren&rsquo;t using</a> so I will deploy this on the servers the next time I do updates</li> <li>There is only <a href="https://github.com/pgjdbc/pgjdbc/issues/1567">one minor fix to a usecase we aren't using</a> so I will deploy this on the servers the next time I do updates</li>
</ul></li> </ul>
</li>
<li><p>Run system updates on DSpace Test (linode19) and reboot it</p></li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
<li>Start looking at IITA's latest round of batch updates that Sisay had <a href="https://dspacetest.cgiar.org/handle/10568/105486">uploaded to DSpace Test</a> earlier this month
<li><p>Start looking at IITA&rsquo;s latest round of batch updates that Sisay had <a href="https://dspacetest.cgiar.org/handle/10568/105486">uploaded to DSpace Test</a> earlier this month</p>
<ul> <ul>
<li>For posterity, IITA&rsquo;s original input file was 20196th.xls and Sisay uploaded it as &ldquo;IITA_Sep_06&rdquo; to DSpace Test</li> <li>For posterity, IITA's original input file was 20196th.xls and Sisay uploaded it as &ldquo;IITA_Sep_06&rdquo; to DSpace Test</li>
<li>Sisay said he did ran the csv-metadata-quality script on the records, but I assume he didn&rsquo;t run the unsafe fixes or AGROVOC checks because I still see unneccessary Unicode, excessive whitespace, one invalid ISBN, missing dates and a few invalid AGROVOC fields</li> <li>Sisay said he did ran the csv-metadata-quality script on the records, but I assume he didn't run the unsafe fixes or AGROVOC checks because I still see unneccessary Unicode, excessive whitespace, one invalid ISBN, missing dates and a few invalid AGROVOC fields</li>
<li>In addition, a few records were missing authorship type</li> <li>In addition, a few records were missing authorship type</li>
<li>I deleted two invalid AGROVOC terms because they were ambiguous</li> <li>I deleted two invalid AGROVOC terms because they were ambiguous</li>
<li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:</li> <li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:
<ul>
<li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li> <li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li>
<li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li> <li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li>
<li>I also looked through the IITA subjects to normalize some values</li>
</ul></li>
<li><p>Follow up with Marissa again about the CCAFS phase II project tags</p></li>
<li><p>Generate a list of the top 1500 authors on CGSpace:</p>
<pre><code>dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = (SELECT metadata_field_id FROM metadatafieldregistry WHERE element = 'contributor' AND qualifier = 'author') AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-09-19-top-1500-authors.csv WITH CSV HEADER;
</code></pre></li>
<li><p>Then I used <code>csvcut</code> to select the column of author names, strip the header and quote characters, and saved the sorted file:</p>
<pre><code>$ csvcut -c text_value /tmp/2019-09-19-top-1500-authors.csv | grep -v text_value | sed 's/&quot;//g' | sort &gt; dspace/config/controlled-vocabularies/dc-contributor-author.xml
</code></pre></li>
<li><p>After adding the XML formatting back to the file I formatted it using XML tidy:</p>
<pre><code>$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-contributor-author.xml
</code></pre></li>
<li><p>I created and merged <a href="https://github.com/ilri/DSpace/pull/433">a pull request for the updates</a></p>
<ul>
<li>This is the first time we&rsquo;ve updated this controlled vocabulary since 2018-09</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-20">2019-09-20</h2> <li>I also looked through the IITA subjects to normalize some values</li>
</ul>
</li>
<li>Follow up with Marissa again about the CCAFS phase II project tags</li>
<li>Generate a list of the top 1500 authors on CGSpace:</li>
</ul>
<pre><code>dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = (SELECT metadata_field_id FROM metadatafieldregistry WHERE element = 'contributor' AND qualifier = 'author') AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-09-19-top-1500-authors.csv WITH CSV HEADER;
</code></pre><ul>
<li>Then I used <code>csvcut</code> to select the column of author names, strip the header and quote characters, and saved the sorted file:</li>
</ul>
<pre><code>$ csvcut -c text_value /tmp/2019-09-19-top-1500-authors.csv | grep -v text_value | sed 's/&quot;//g' | sort &gt; dspace/config/controlled-vocabularies/dc-contributor-author.xml
</code></pre><ul>
<li>After adding the XML formatting back to the file I formatted it using XML tidy:</li>
</ul>
<pre><code>$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-contributor-author.xml
</code></pre><ul>
<li>I created and merged <a href="https://github.com/ilri/DSpace/pull/433">a pull request for the updates</a>
<ul> <ul>
<li>Deploy a fresh snapshot of CGSpace&rsquo;s PostgreSQL database on DSpace Test so we can get more accurate duplicate checking with the upcoming Bioversity and IITA migrations</li> <li>This is the first time we've updated this controlled vocabulary since 2018-09</li>
</ul>
<li><p>Skype with Carol and Francesca to discuss the Bioveristy migration to CGSpace</p> </li>
</ul>
<h2 id="20190920">2019-09-20</h2>
<ul>
<li>Deploy a fresh snapshot of CGSpace's PostgreSQL database on DSpace Test so we can get more accurate duplicate checking with the upcoming Bioversity and IITA migrations</li>
<li>Skype with Carol and Francesca to discuss the Bioveristy migration to CGSpace
<ul> <ul>
<li>They want to do some enrichment of the metadata to add countries and regions</li> <li>They want to do some enrichment of the metadata to add countries and regions</li>
<li>Also, they noticed that some items have a blank ISSN in the citation like &ldquo;ISSN:&rdquo;</li> <li>Also, they noticed that some items have a blank ISSN in the citation like &ldquo;ISSN:&rdquo;</li>
<li>I told them it&rsquo;s probably best if we have Francesco produce a new export from Typo 3</li> <li>I told them it's probably best if we have Francesco produce a new export from Typo 3</li>
<li>But on second thought I think that I&rsquo;ve already done so much work on this file as it is that I should fix what I can here and then do a new import to DSpace Test with the PDFs</li> <li>But on second thought I think that I've already done so much work on this file as it is that I should fix what I can here and then do a new import to DSpace Test with the PDFs</li>
<li>Other corrections would be to replace &ldquo;Inst.&rdquo; and &ldquo;Instit.&rdquo; with &ldquo;Institute&rdquo; and remove those blank ISSNs from the citations</li> <li>Other corrections would be to replace &ldquo;Inst.&rdquo; and &ldquo;Instit.&rdquo; with &ldquo;Institute&rdquo; and remove those blank ISSNs from the citations</li>
<li>I will rename the files with multiple underscores so they match the filename column in the CSV using this command:</li>
<li><p>I will rename the files with multiple underscores so they match the filename column in the CSV using this command:</p> </ul>
</li>
</ul>
<pre><code>$ perl-rename -n 's/_{2,3}/_/g' *.pdf <pre><code>$ perl-rename -n 's/_{2,3}/_/g' *.pdf
</code></pre></li> </code></pre><ul>
</ul></li> <li>I was going preparing to run SAFBuilder for the Bioversity migration and decided to check the list of PDFs on my local machine versus on DSpace Test (where I had downloaded them last month)
<li><p>I was going preparing to run SAFBuilder for the Bioversity migration and decided to check the list of PDFs on my local machine versus on DSpace Test (where I had downloaded them last month)</p>
<ul> <ul>
<li>There are a <em>few dozen</em> that have completely fucked up names due to some encoding error</li> <li>There are a <em>few dozen</em> that have completely fucked up names due to some encoding error</li>
<li>To make matters worse, when I tried to download them, some of the links in the &ldquo;URL&rdquo; column that Francesco included are wrong, so I had to go to the permalink and get a link that worked</li> <li>To make matters worse, when I tried to download them, some of the links in the &ldquo;URL&rdquo; column that Francesco included are wrong, so I had to go to the permalink and get a link that worked</li>
<li>After downloading everything I had to use Ubuntu's version of rename to get rid of all the double and triple underscores:</li>
<li><p>After downloading everything I had to use Ubuntu&rsquo;s version of rename to get rid of all the double and triple underscores:</p> </ul>
</li>
</ul>
<pre><code>$ rename -v 's/___/_/g' *.pdf <pre><code>$ rename -v 's/___/_/g' *.pdf
$ rename -v 's/__/_/g' *.pdf $ rename -v 's/__/_/g' *.pdf
</code></pre></li> </code></pre><ul>
</ul></li> <li>I'm still waiting to hear what Carol and Francesca want to do with the <code>1195.pdf.LCK</code> file (for now I've removed it from the CSV, but for future reference it has the number 630 in its permalink)</li>
<li>I wrote two fairly long GREL expressions to clean up the institutional author names in the <code>dc.contributor.author</code> and <code>dc.identifier.citation</code> fields using OpenRefine
<li><p>I&rsquo;m still waiting to hear what Carol and Francesca want to do with the <code>1195.pdf.LCK</code> file (for now I&rsquo;ve removed it from the CSV, but for future reference it has the number 630 in its permalink)</p></li>
<li><p>I wrote two fairly long GREL expressions to clean up the institutional author names in the <code>dc.contributor.author</code> and <code>dc.identifier.citation</code> fields using OpenRefine</p>
<ul> <ul>
<li><p>The first targets acronyms in parentheses like &ldquo;International Livestock Research Institute (ILRI)&rdquo;:</p> <li>The first targets acronyms in parentheses like &ldquo;International Livestock Research Institute (ILRI)&quot;:</li>
</ul>
</li>
</ul>
<pre><code>value.replace(/,? ?\((ANDES|APAFRI|APFORGEN|Canada|CFC|CGRFA|China|CacaoNet|CATAS|CDU|CIAT|CIRF|CIP|CIRNMA|COSUDE|Colombia|COA|COGENT|CTDT|Denmark|DfLP|DSE|ECPGR|ECOWAS|ECP\/GR|England|EUFORGEN|FAO|France|Francia|FFTC|Germany|GEF|GFU|GGCO|GRPI|italy|Italy|Italia|India|ICCO|ICAR|ICGR|ICRISAT|IDRC|INFOODS|IPGRI|IBPGR|ICARDA|ILRI|INIBAP|INBAR|IPK|ISG|IT|Japan|JIRCAS|Kenya|LI\-BIRD|Malaysia|NARC|NBPGR|Nepal|OOAS|RDA|RISBAP|Rome|ROPPA|SEARICE|Senegal|SGRP|Sweden|Syrian Arab Republic|The Netherlands|UNDP|UK|UNEP|UoB|UoM|United Kingdom|WAHO)\)/,&quot;&quot;) <pre><code>value.replace(/,? ?\((ANDES|APAFRI|APFORGEN|Canada|CFC|CGRFA|China|CacaoNet|CATAS|CDU|CIAT|CIRF|CIP|CIRNMA|COSUDE|Colombia|COA|COGENT|CTDT|Denmark|DfLP|DSE|ECPGR|ECOWAS|ECP\/GR|England|EUFORGEN|FAO|France|Francia|FFTC|Germany|GEF|GFU|GGCO|GRPI|italy|Italy|Italia|India|ICCO|ICAR|ICGR|ICRISAT|IDRC|INFOODS|IPGRI|IBPGR|ICARDA|ILRI|INIBAP|INBAR|IPK|ISG|IT|Japan|JIRCAS|Kenya|LI\-BIRD|Malaysia|NARC|NBPGR|Nepal|OOAS|RDA|RISBAP|Rome|ROPPA|SEARICE|Senegal|SGRP|Sweden|Syrian Arab Republic|The Netherlands|UNDP|UK|UNEP|UoB|UoM|United Kingdom|WAHO)\)/,&quot;&quot;)
</code></pre></li> </code></pre><ul>
<li>The second targets cities and countries after names like &ldquo;International Livestock Research Intstitute, Kenya&rdquo;:</li>
<li><p>The second targets cities and countries after names like &ldquo;International Livestock Research Intstitute, Kenya&rdquo;:</p> </ul>
<pre><code>replace(/,? ?(ali|Aleppo|Amsterdam|Beijing|Bonn|Burkina Faso|CN|Dakar|Gatersleben|London|Montpellier|Nairobi|New Delhi|Kaski|Kepong|Malaysia|Khumaltar|Lima|Ltpur|Ottawa|Patancheru|Peru|Pokhara|Rome|Uppsala|University of Mauritius|Tsukuba)/,&quot;&quot;) <pre><code>replace(/,? ?(ali|Aleppo|Amsterdam|Beijing|Bonn|Burkina Faso|CN|Dakar|Gatersleben|London|Montpellier|Nairobi|New Delhi|Kaski|Kepong|Malaysia|Khumaltar|Lima|Ltpur|Ottawa|Patancheru|Peru|Pokhara|Rome|Uppsala|University of Mauritius|Tsukuba)/,&quot;&quot;)
</code></pre></li> </code></pre><ul>
</ul></li> <li>I imported the 1,427 Bioversity records with bitstreams to a new collection called <a href="https://dspacetest.cgiar.org/handle/10568/103688">2019-09-20 Bioversity Migration Test</a> on DSpace Test (after splitting them in two batches of about 700 each):</li>
</ul>
<li><p>I imported the 1,427 Bioversity records with bitstreams to a new collection called <a href="https://dspacetest.cgiar.org/handle/10568/103688">2019-09-20 Bioversity Migration Test</a> on DSpace Test (after splitting them in two batches of about 700 each):</p>
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx768m' <pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx768m'
$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity1.map -s /home/aorth/Bioversity/bioversity1 $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity1.map -s /home/aorth/Bioversity/bioversity1
$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bioversity/bioversity2 $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bioversity/bioversity2
</code></pre></li> </code></pre><ul>
<li>After that I exported the collection again and started doing some quality checks and cleanups:
<li><p>After that I exported the collection again and started doing some quality checks and cleanups:</p>
<ul> <ul>
<li>Change all DOIs to use <a href="https://doi.org">https://doi.org</a> format</li> <li>Change all DOIs to use <a href="https://doi.org">https://doi.org</a> format</li>
<li>Change all bioversityinternational.org links to use https://</li> <li>Change all bioversityinternational.org links to use https://</li>
<li>Fix ten authors with invalid names like &ldquo;Orth,.&rdquo; by checking the correct name in the citation</li> <li>Fix ten authors with invalid names like &ldquo;Orth,.&rdquo; by checking the correct name in the citation</li>
<li>Fix several invalid ISBNs, but there are several more that contain incorrect ISBNs in their PDFs!</li> <li>Fix several invalid ISBNs, but there are several more that contain incorrect ISBNs in their PDFs!</li>
<li>Fix some citations that were using &ldquo;ISSN&rdquo; instead of ISBN</li> <li>Fix some citations that were using &ldquo;ISSN&rdquo; instead of ISBN</li>
</ul></li> </ul>
</li>
<li><p>The next steps are:</p> <li>The next steps are:
<ul> <ul>
<li>Check for duplicates</li> <li>Check for duplicates</li>
<li>Continue with institutional author normalization</li> <li>Continue with institutional author normalization</li>
<li>Ask which collection to map items with type Brochure, Journal Item, and Thesis?</li> <li>Ask which collection to map items with type Brochure, Journal Item, and Thesis?</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-21">2019-09-21</h2> </ul>
<h2 id="20190921">2019-09-21</h2>
<ul> <ul>
<li>Re-upload the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records to DSpace Test</a> after I did the re-sync yesterday <li>Re-upload the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records to DSpace Test</a> after I did the re-sync yesterday
<ul> <ul>
<li>Then I looked at the records again and sent some feedback about three duplicates to Bosede</li> <li>Then I looked at the records again and sent some feedback about three duplicates to Bosede</li>
<li>Also I noticed that many journal articles have the journal and page information in the citation, but are missing <code>dc.source</code> and <code>dc.format.extent</code> fields</li> <li>Also I noticed that many journal articles have the journal and page information in the citation, but are missing <code>dc.source</code> and <code>dc.format.extent</code> fields</li>
</ul></li> </ul>
</li>
<li>Play with language identification using the langdetect, fasttext, polyglot, and langid libraries <li>Play with language identification using the langdetect, fasttext, polyglot, and langid libraries
<ul> <ul>
<li>ployglot requires too many system things to compile</li> <li>ployglot requires too many system things to compile</li>
<li>langdetect didn&rsquo;t seem as accurate as the others</li> <li>langdetect didn't seem as accurate as the others</li>
<li>fasttext is likely the best, but <a href="https://github.com/facebookresearch/fastText/issues/909">prints a blank link to the console when loading a model</a></li> <li>fasttext is likely the best, but <a href="https://github.com/facebookresearch/fastText/issues/909">prints a blank link to the console when loading a model</a></li>
<li>langid seems to be the best considering the above experiences</li> <li>langid seems to be the best considering the above experiences</li>
</ul></li>
<li>I added very experimental language detection to the <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> module
<ul>
<li>It works by checking the predicted language of the <code>dc.title</code> field against the item&rsquo;s <code>dc.language.iso</code> field</li>
<li>I tested it on the Bioversity migration data set and it actually helped me correct eleven language fields in their records!</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-24">2019-09-24</h2> <li>I added very experimental language detection to the <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> module
<ul>
<li>It works by checking the predicted language of the <code>dc.title</code> field against the item's <code>dc.language.iso</code> field</li>
<li>I tested it on the Bioversity migration data set and it actually helped me correct eleven language fields in their records!</li>
</ul>
</li>
</ul>
<h2 id="20190924">2019-09-24</h2>
<ul> <ul>
<li>Bosede fixed a few of the things I mentioned in her Sept 6 batch records, but there were still issues <li>Bosede fixed a few of the things I mentioned in her Sept 6 batch records, but there were still issues
<ul> <ul>
<li>I sent her a bit more feedback because when I asked her to delete a duplicate, she deleted the <em>existing</em> item on DSpace Test rather than the new one in the new batch file!</li> <li>I sent her a bit more feedback because when I asked her to delete a duplicate, she deleted the <em>existing</em> item on DSpace Test rather than the new one in the new batch file!</li>
<li>I fixed two incorrect languages after analyzing it with my beta language detection in the csv-metadata-quality tool</li> <li>I fixed two incorrect languages after analyzing it with my beta language detection in the csv-metadata-quality tool</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-09-26">2019-09-26</h2> </ul>
<h2 id="20190926">2019-09-26</h2>
<ul> <ul>
<li>Release <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.0">version 0.3.0 of the csv-metadata-quality</a> tool <li>Release <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.0">version 0.3.0 of the csv-metadata-quality</a> tool
<ul> <ul>
<li>This version includes the experimental validation of languages using the Python <code>langid</code> library</li> <li>This version includes the experimental validation of languages using the Python <code>langid</code> library</li>
<li>I also included updated pytest tests and test files that specifically test this functionality</li> <li>I also included updated pytest tests and test files that specifically test this functionality</li>
</ul></li> </ul>
</li>
<li>Give more feedback to Bosede about the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records on DSpace Test</a> <li>Give more feedback to Bosede about the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records on DSpace Test</a>
<ul> <ul>
<li>I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate</li> <li>I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate</li>
<li>I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh)</li> <li>I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh)</li>
</ul></li> </ul>
</li>
<li><p>Get a list of institutions from CCAFS&rsquo;s Clarisa API and try to parse it with <code>jq</code>, do some small cleanups and add a header in <code>sed</code>, and then pass it through <code>csvcut</code> to add line numbers:</p> <li>Get a list of institutions from CCAFS's Clarisa API and try to parse it with <code>jq</code>, do some small cleanups and add a header in <code>sed</code>, and then pass it through <code>csvcut</code> to add line numbers:</li>
</ul>
<pre><code>$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/&quot;//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' &gt; /tmp/clarisa-institutions.csv <pre><code>$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/&quot;//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' &gt; /tmp/clarisa-institutions.csv
$ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u
</code></pre></li> </code></pre><ul>
<li>The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode</li>
<li><p>The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode</p></li> <li>I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against&hellip;</li>
<li><p>I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against&hellip;</p></li>
</ul> </ul>
<h2 id="20190927">2019-09-27</h2>
<h2 id="2019-09-27">2019-09-27</h2>
<ul> <ul>
<li>Skype with Peter and Abenet about CGSpace actions <li>Skype with Peter and Abenet about CGSpace actions
<ul> <ul>
<li>Peter will respond to ICARDA&rsquo;s request to deposit items in to CGSpace, with a caveat that we agree on some vocabulary standards for institutions, countries, regions, etc</li> <li>Peter will respond to ICARDA's request to deposit items in to CGSpace, with a caveat that we agree on some vocabulary standards for institutions, countries, regions, etc</li>
<li>We discussed using ISO 3166 for countries, though Peter doesn&rsquo;t like the formal names like &ldquo;Moldova, Republic of&rdquo; and &ldquo;Tanzania, United Republic of&rdquo;</li> <li>We discussed using ISO 3166 for countries, though Peter doesn't like the formal names like &ldquo;Moldova, Republic of&rdquo; and &ldquo;Tanzania, United Republic of&rdquo;
<ul>
<li>The Debian <code>iso-codes</code> package has ISO 3166-1 with &ldquo;common name&rdquo;, &ldquo;name&rdquo;, and &ldquo;official name&rdquo; representations, for example: <li>The Debian <code>iso-codes</code> package has ISO 3166-1 with &ldquo;common name&rdquo;, &ldquo;name&rdquo;, and &ldquo;official name&rdquo; representations, for example:
<ul> <ul>
<li>common_name: Tanzania</li> <li>common_name: Tanzania</li>
<li>name: Tanzania, United Republic of</li> <li>name: Tanzania, United Republic of</li>
<li>official_name: United Republic of Tanzania</li> <li>official_name: United Republic of Tanzania</li>
</ul></li> </ul>
</li>
<li>There are still some unfortunate ones there, though: <li>There are still some unfortunate ones there, though:
<ul> <ul>
<li>name: Korea, Democratic People&rsquo;s Republic of</li> <li>name: Korea, Democratic People's Republic of</li>
<li>official_name: Democratic People&rsquo;s Republic of Korea</li> <li>official_name: Democratic People's Republic of Korea</li>
</ul></li> </ul>
<li>And this, which isn&rsquo;t even in English&hellip; </li>
<li>And this, which isn't even in English&hellip;
<ul> <ul>
<li>name: Côte d&rsquo;Ivoire</li> <li>name: Côte d'Ivoire</li>
<li>official_name: Republic of Côte d&rsquo;Ivoire</li> <li>official_name: Republic of Côte d'Ivoire</li>
</ul></li> </ul>
</li>
<li>The other alternative is to just keep using the names we have, which are mostly compliant with AGROVOC</li> <li>The other alternative is to just keep using the names we have, which are mostly compliant with AGROVOC</li>
</ul>
</li>
<li>Peter said that a new server for DSpace Test is fine, so I can proceed with the normal process of getting approval from Michael Victor and ICT when I have time (recommend moving from $40 to $80/month Linode, with 16GB RAM)</li> <li>Peter said that a new server for DSpace Test is fine, so I can proceed with the normal process of getting approval from Michael Victor and ICT when I have time (recommend moving from $40 to $80/month Linode, with 16GB RAM)</li>
<li>I need to ask Atmire for a quote to upgrade CGSpace to DSpace 6 with all current modules so we can see how many more credits we need</li> <li>I need to ask Atmire for a quote to upgrade CGSpace to DSpace 6 with all current modules so we can see how many more credits we need</li>
</ul></li> </ul>
</li>
<li>A little bit more work on the Sept 6 IITA batch records <li>A little bit more work on the Sept 6 IITA batch records
<ul> <ul>
<li>Bosede deleted the one item that I told her was a duplicate</li> <li>Bosede deleted the one item that I told her was a duplicate</li>
<li>I checked the AGROVOC subjects and fixed one incorrect one</li> <li>I checked the AGROVOC subjects and fixed one incorrect one</li>
<li>Then I told her that I think the items are ready to go to CGSpace and asked Abenet for a final comment</li> <li>Then I told her that I think the items are ready to go to CGSpace and asked Abenet for a final comment</li>
</ul></li>
</ul> </ul>
</li>
<!-- vim: set sw=2 ts=2: --> </ul>
<!-- raw HTML omitted -->

View File

@ -6,8 +6,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="October, 2019" /> <meta property="og:title" content="October, 2019" />
<meta property="og:description" content="2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace <meta property="og:description" content="2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U&#43;00A0) there that would otherwise be removed by the csv-metadata-quality script&#39;s &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c &#39;id,dc." />
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U&#43;00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:" />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-10/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-10/" />
<meta property="article:published_time" content="2019-10-01T13:20:51+03:00" /> <meta property="article:published_time" content="2019-10-01T13:20:51+03:00" />
@ -15,9 +14,8 @@
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2019"/> <meta name="twitter:title" content="October, 2019"/>
<meta name="twitter:description" content="2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace <meta name="twitter:description" content="2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U&#43;00A0) there that would otherwise be removed by the csv-metadata-quality script&#39;s &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c &#39;id,dc."/>
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U&#43;00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:"/> <meta name="generator" content="Hugo 0.60.0" />
<meta name="generator" content="Hugo 0.59.1" />
@ -98,159 +96,125 @@
</p> </p>
</header> </header>
<h2 id="20191001">2019-10-01</h2>
<h2 id="2019-10-01">2019-10-01</h2>
<ul> <ul>
<li><p>Udana from IWMI asked me for a CSV export of their community on CGSpace</p> <li>Udana from IWMI asked me for a CSV export of their community on CGSpace
<ul> <ul>
<li>I exported it, but a quick run through the <code>csv-metadata-quality</code> tool shows that there are some low-hanging fruits we can fix before I send him the data</li> <li>I exported it, but a quick run through the <code>csv-metadata-quality</code> tool shows that there are some low-hanging fruits we can fix before I send him the data</li>
<li>I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's &ldquo;unneccesary Unicode&rdquo; fix:</li>
<li><p>I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:</p>
<pre><code>$ csvcut -c 'id,dc.title[en_US],cg.coverage.region[en_US],cg.coverage.subregion[en_US],cg.river.basin[en_US]' ~/Downloads/10568-16814.csv &gt; /tmp/iwmi-title-region-subregion-river.csv
</code></pre></li>
<li><p>Then I replace them in vim with <code>:% s/\%u00a0/ /g</code> because I can&rsquo;t figure out the correct sed syntax to do it directly from the pipe above</p></li>
<li><p>I uploaded those to CGSpace and then re-exported the metadata</p></li>
<li><p>Now that I think about it, I shouldn&rsquo;t be removing non-breaking spaces (U+00A0), I should be replacing them with normal spaces!</p></li>
<li><p>I modified the script so it replaces the non-breaking spaces instead of removing them</p></li>
<li><p>Then I ran the csv-metadata-quality script to do some general cleanups (though I temporarily commented out the whitespace fixes because it was too many thousands of rows):</p>
<pre><code>$ csv-metadata-quality -i ~/Downloads/10568-16814.csv -o /tmp/iwmi.csv -x 'dc.date.issued,dc.date.issued[],dc.date.issued[en_US]' -u
</code></pre></li>
<li><p>That fixed 153 items (unnecessary Unicode, duplicates, commaspace fixes, etc)</p></li>
</ul></li>
<li><p>Release <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.1">version 0.3.1 of the csv-metadata-quality script</a> with the non-breaking spaces change</p></li>
</ul> </ul>
</li>
<h2 id="2019-10-03">2019-10-03</h2> </ul>
<pre><code>$ csvcut -c 'id,dc.title[en_US],cg.coverage.region[en_US],cg.coverage.subregion[en_US],cg.river.basin[en_US]' ~/Downloads/10568-16814.csv &gt; /tmp/iwmi-title-region-subregion-river.csv
</code></pre><ul>
<li>Then I replace them in vim with <code>:% s/\%u00a0/ /g</code> because I can't figure out the correct sed syntax to do it directly from the pipe above</li>
<li>I uploaded those to CGSpace and then re-exported the metadata</li>
<li>Now that I think about it, I shouldn't be removing non-breaking spaces (U+00A0), I should be replacing them with normal spaces!</li>
<li>I modified the script so it replaces the non-breaking spaces instead of removing them</li>
<li>Then I ran the csv-metadata-quality script to do some general cleanups (though I temporarily commented out the whitespace fixes because it was too many thousands of rows):</li>
</ul>
<pre><code>$ csv-metadata-quality -i ~/Downloads/10568-16814.csv -o /tmp/iwmi.csv -x 'dc.date.issued,dc.date.issued[],dc.date.issued[en_US]' -u
</code></pre><ul>
<li>That fixed 153 items (unnecessary Unicode, duplicates, commaspace fixes, etc)</li>
<li>Release <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.1">version 0.3.1 of the csv-metadata-quality script</a> with the non-breaking spaces change</li>
</ul>
<h2 id="20191003">2019-10-03</h2>
<ul> <ul>
<li>Upload the 117 IITA records that we had been working on last month (aka 20196th.xls aka Sept 6) to CGSpace</li> <li>Upload the 117 IITA records that we had been working on last month (aka 20196th.xls aka Sept 6) to CGSpace</li>
</ul> </ul>
<h2 id="20191004">2019-10-04</h2>
<h2 id="2019-10-04">2019-10-04</h2>
<ul> <ul>
<li><p>Create an account for Bioversity&rsquo;s ICT consultant Francesco on DSpace Test:</p> <li>Create an account for Bioversity's ICT consultant Francesco on DSpace Test:</li>
</ul>
<pre><code>$ dspace user -a -m blah@mail.it -g Francesco -s Vernocchi -p 'fffff' <pre><code>$ dspace user -a -m blah@mail.it -g Francesco -s Vernocchi -p 'fffff'
</code></pre></li> </code></pre><ul>
<li>Email Francesca and Carol to ask for follow up about the test upload I did on 2019-09-21
<li><p>Email Francesca and Carol to ask for follow up about the test upload I did on 2019-09-21</p>
<ul> <ul>
<li>I suggested that if they still want to do value addition of those records (like adding countries, regions, etc) that they could maybe do it after we migrate the records to CGSpace</li> <li>I suggested that if they still want to do value addition of those records (like adding countries, regions, etc) that they could maybe do it after we migrate the records to CGSpace</li>
<li>Carol responded to tell me where to map the items with type Brochure, Journal Item, and Thesis, so I applied them to the <a href="https://dspacetest.cgiar.org/handle/10568/103688">collection on DSpace Test</a></li> <li>Carol responded to tell me where to map the items with type Brochure, Journal Item, and Thesis, so I applied them to the <a href="https://dspacetest.cgiar.org/handle/10568/103688">collection on DSpace Test</a></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-06">2019-10-06</h2> </ul>
<h2 id="20191006">2019-10-06</h2>
<ul> <ul>
<li>Hector from CCAFS responded about my feedback of their CLARISA API <li>Hector from CCAFS responded about my feedback of their CLARISA API
<ul> <ul>
<li>He made some fixes to the metadata values they are using based on my feedback and said they are happy if we would use it</li> <li>He made some fixes to the metadata values they are using based on my feedback and said they are happy if we would use it</li>
</ul></li> </ul>
</li>
<li>Gabriela from CIP asked me if it was possible to generate an RSS feed of items that have the CIP subject &ldquo;POTATO AGRI-FOOD SYSTEMS&rdquo; <li>Gabriela from CIP asked me if it was possible to generate an RSS feed of items that have the CIP subject &ldquo;POTATO AGRI-FOOD SYSTEMS&rdquo;
<ul> <ul>
<li>I notice that there is a similar term &ldquo;SWEETPOTATO AGRI-FOOD SYSTEMS&rdquo; so I had to come up with a way to exclude that using the boolean &ldquo;AND NOT&rdquo; in the <a href="https://cgspace.cgiar.org/open-search/discover?query=cipsubject:POTATO%20AGRI%E2%80%90FOOD%20SYSTEMS%20AND%20NOT%20cipsubject:SWEETPOTATO%20AGRI%E2%80%90FOOD%20SYSTEMS&amp;scope=10568/51671&amp;sort_by=3&amp;order=DESC">OpenSearch query</a></li> <li>I notice that there is a similar term &ldquo;SWEETPOTATO AGRI-FOOD SYSTEMS&rdquo; so I had to come up with a way to exclude that using the boolean &ldquo;AND NOT&rdquo; in the <a href="https://cgspace.cgiar.org/open-search/discover?query=cipsubject:POTATO%20AGRI%E2%80%90FOOD%20SYSTEMS%20AND%20NOT%20cipsubject:SWEETPOTATO%20AGRI%E2%80%90FOOD%20SYSTEMS&amp;scope=10568/51671&amp;sort_by=3&amp;order=DESC">OpenSearch query</a></li>
<li>Again, the <code>sort_by=3</code> parameter is the accession date, as configured in <code>dspace.cfg</code></li> <li>Again, the <code>sort_by=3</code> parameter is the accession date, as configured in <code>dspace.cfg</code></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-08">2019-10-08</h2> </ul>
<h2 id="20191008">2019-10-08</h2>
<ul> <ul>
<li>Fix 108 more issues with authors in the ongoing Bioversity migration on DSpace Test, for example: <li>Fix 108 more issues with authors in the ongoing Bioversity migration on DSpace Test, for example:
<ul> <ul>
<li>Europeanooperative Programme for Plant Genetic Resources</li> <li>Europeanooperative Programme for Plant Genetic Resources</li>
<li>Bioversity International. Capacity Development Unit</li> <li>Bioversity International. Capacity Development Unit</li>
<li>W.M. van der Heide, W.M., Tripp, R.</li> <li>W.M. van der Heide, W.M., Tripp, R.</li>
<li>Internationallant Genetic Resources Institute</li> <li>Internationallant Genetic Resources Institute</li>
</ul></li>
<li>Start looking at duplicates in the Bioversity migration data on DSpace Test
<ul>
<li>I&rsquo;m keeping track of the originals and duplicates in a Google Docs spreadsheet that I will share with Bioversity</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-09">2019-10-09</h2> <li>Start looking at duplicates in the Bioversity migration data on DSpace Test
<ul>
<li>I'm keeping track of the originals and duplicates in a Google Docs spreadsheet that I will share with Bioversity</li>
</ul>
</li>
</ul>
<h2 id="20191009">2019-10-09</h2>
<ul> <ul>
<li>Continue working on identifying duplicates in the Bioversity migration <li>Continue working on identifying duplicates in the Bioversity migration
<ul> <ul>
<li>I have been recording the originals and duplicates in a spreadsheet so I can map them later</li> <li>I have been recording the originals and duplicates in a spreadsheet so I can map them later</li>
<li>For now I am just reconciling any incorrect or missing metadata in the original items on CGSpace, deleting the duplicate from DSpace Test, and mapping the original to the correct place on CGSpace</li> <li>For now I am just reconciling any incorrect or missing metadata in the original items on CGSpace, deleting the duplicate from DSpace Test, and mapping the original to the correct place on CGSpace</li>
<li>So far I have deleted thirty duplicates and mapped fourteen</li> <li>So far I have deleted thirty duplicates and mapped fourteen</li>
</ul></li> </ul>
</li>
<li>Run all system updates on DSpace Test (linode19) and reboot the server</li> <li>Run all system updates on DSpace Test (linode19) and reboot the server</li>
</ul> </ul>
<h2 id="20191010">2019-10-10</h2>
<h2 id="2019-10-10">2019-10-10</h2>
<ul> <ul>
<li><p>Felix Shaw from Earlham emailed me to ask about his admin account on DSpace Test</p> <li>Felix Shaw from Earlham emailed me to ask about his admin account on DSpace Test
<ul> <ul>
<li>His old one got lost when I re-sync&rsquo;d DSpace Test with CGSpace a few weeks ago</li> <li>His old one got lost when I re-sync'd DSpace Test with CGSpace a few weeks ago</li>
<li>I added a new account for him and added it to the Administrators group:</li>
<li><p>I added a new account for him and added it to the Administrators group:</p>
<pre><code>$ dspace user -a -m wow@me.com -g Felix -s Shaw -p 'fuananaaa'
</code></pre></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-11">2019-10-11</h2> </ul>
<pre><code>$ dspace user -a -m wow@me.com -g Felix -s Shaw -p 'fuananaaa'
</code></pre><h2 id="20191011">2019-10-11</h2>
<ul> <ul>
<li><p>I ran the DSpace cleanup function on CGSpace and it found some errors:</p> <li>I ran the DSpace cleanup function on CGSpace and it found some errors:</li>
</ul>
<pre><code>$ dspace cleanup -v <pre><code>$ dspace cleanup -v
... ...
Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(171221) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(171221) is still referenced from table &quot;bundle&quot;.
</code></pre></li> </code></pre><ul>
<li>The solution, as always, is (repeat as many times as needed):</li>
<li><p>The solution, as always, is (repeat as many times as needed):</p> </ul>
<pre><code># su - postgres <pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (171221);' $ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (171221);'
UPDATE 1 UPDATE 1
</code></pre></li> </code></pre><h2 id="20191012">2019-10-12</h2>
</ul>
<h2 id="2019-10-12">2019-10-12</h2>
<ul> <ul>
<li>More work on identifying duplicates in the Bioversity migration data on DSpace Test <li>More work on identifying duplicates in the Bioversity migration data on DSpace Test
<ul> <ul>
<li>I mapped twenty-five more items on CGSpace and deleted them from the migration test collection on DSpace Test</li> <li>I mapped twenty-five more items on CGSpace and deleted them from the migration test collection on DSpace Test</li>
<li>After a few hours I think I finished all the duplicates that were identified by Atmire&rsquo;s Duplicate Checker module</li> <li>After a few hours I think I finished all the duplicates that were identified by Atmire's Duplicate Checker module</li>
<li>According to my spreadsheet there were fifty-two in total</li> <li>According to my spreadsheet there were fifty-two in total</li>
</ul></li> </ul>
</li>
<li><p>I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies</p> <li>I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies
<ul> <ul>
<li><p>I made some corrections in a CSV:</p> <li>I made some corrections in a CSV:</li>
</ul>
</li>
</ul>
<pre><code>from,to <pre><code>from,to
CIAT,International Center for Tropical Agriculture CIAT,International Center for Tropical Agriculture
International Centre for Tropical Agriculture,International Center for Tropical Agriculture International Centre for Tropical Agriculture,International Center for Tropical Agriculture
@ -259,170 +223,139 @@ International Centre for Agricultural Research in the Dry Areas,International Ce
International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center
&quot;Agricultural Information Resource Centre, Kenya.&quot;,&quot;Agricultural Information Resource Centre, Kenya&quot; &quot;Agricultural Information Resource Centre, Kenya.&quot;,&quot;Agricultural Information Resource Centre, Kenya&quot;
&quot;Centre for Livestock and Agricultural Development, Cambodia&quot;,&quot;Centre for Livestock and Agriculture Development, Cambodia&quot; &quot;Centre for Livestock and Agricultural Development, Cambodia&quot;,&quot;Centre for Livestock and Agriculture Development, Cambodia&quot;
</code></pre></li> </code></pre><ul>
</ul></li> <li>Then I applied it with my <code>fix-metadata-values.py</code> script on CGSpace:</li>
</ul>
<li><p>Then I applied it with my <code>fix-metadata-values.py</code> script on CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to <pre><code>$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
</code></pre></li> </code></pre><ul>
<li>I did some manual curation of about 300 authors in OpenRefine in preparation for telling Peter and Abenet that the migration is almost ready
<li><p>I did some manual curation of about 300 authors in OpenRefine in preparation for telling Peter and Abenet that the migration is almost ready</p>
<ul> <ul>
<li>I would still like to perhaps (re)move institutional authors from <code>dc.contributor.author</code> to <code>cg.contributor.affiliation</code>, but I will have to run that by Francesca, Carol, and Abenet</li> <li>I would still like to perhaps (re)move institutional authors from <code>dc.contributor.author</code> to <code>cg.contributor.affiliation</code>, but I will have to run that by Francesca, Carol, and Abenet</li>
<li>I could use a custom text facet like this in OpenRefine to find authors that likely match the &ldquo;Last, F.&rdquo; pattern: <code>isNotNull(value.match(/^.*, \p{Lu}\.?.*$/))</code></li> <li>I could use a custom text facet like this in OpenRefine to find authors that likely match the &ldquo;Last, F.&rdquo; pattern: <code>isNotNull(value.match(/^.*, \p{Lu}\.?.*$/))</code></li>
<li>The <code>\p{Lu}</code> is a cool <a href="https://www.regular-expressions.info/unicode.html">regex character class</a> to make sure this works for letters with accents</li> <li>The <code>\p{Lu}</code> is a cool <a href="https://www.regular-expressions.info/unicode.html">regex character class</a> to make sure this works for letters with accents</li>
<li>As cool as that is, it&rsquo;s actually more effective to just search for authors that have &ldquo;.&rdquo; in them!</li> <li>As cool as that is, it's actually more effective to just search for authors that have &ldquo;.&rdquo; in them!</li>
<li>I&rsquo;ve decided to add a <code>cg.contributor.affiliation</code> column to 1,025 items based on the logic above where the author name is not an actual person</li> <li>I've decided to add a <code>cg.contributor.affiliation</code> column to 1,025 items based on the logic above where the author name is not an actual person</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-13">2019-10-13</h2> </ul>
<h2 id="20191013">2019-10-13</h2>
<ul> <ul>
<li>More cleanup work on the authors in the Bioversity migration <li>More cleanup work on the authors in the Bioversity migration
<ul> <ul>
<li>Now I sent the final feedback to Francesca, Carol, and Abenet</li> <li>Now I sent the final feedback to Francesca, Carol, and Abenet</li>
</ul></li> </ul>
</li>
<li><p>Peter is still seeing some authors listed with &ldquo;|&rdquo; in the &ldquo;Top Authors&rdquo; statistics for some collections</p> <li>Peter is still seeing some authors listed with &ldquo;|&rdquo; in the &ldquo;Top Authors&rdquo; statistics for some collections
<ul> <ul>
<li>I looked in some of the items that are listed and the author field does not contain those invalid separators</li> <li>I looked in some of the items that are listed and the author field does not contain those invalid separators</li>
<li>I decided to try doing a full Discovery re-indexing on CGSpace (linode18):</li>
<li><p>I decided to try doing a full Discovery re-indexing on CGSpace (linode18):</p> </ul>
</li>
</ul>
<pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b <pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 82m35.993s real 82m35.993s
</code></pre></li> </code></pre><ul>
</ul></li> <li>After the re-indexing the top authors still list the following:</li>
<li><p>After the re-indexing the top authors still list the following:</p>
<pre><code>Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J.
</code></pre></li>
<li><p>I looked in the database to find authors that had &ldquo;|&rdquo; in them:</p>
<pre><code>dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%';
text_value | resource_id
----------------------------------+-------------
Anandajayasekeram, P.|Puskur, R. | 157
Morales, J.|Renner, I. | 22779
Zahid, A.|Haque, M.A. | 25492
(3 rows)
</code></pre></li>
<li><p>Then I found their handles and corrected them, for example:</p>
<pre><code>dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2;
handle
-----------
10568/129
(1 row)
</code></pre></li>
<li><p>So I&rsquo;m still not sure where these weird authors in the &ldquo;Top Author&rdquo; stats are coming from</p></li>
</ul> </ul>
<pre><code>Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J.
<h2 id="2019-10-14">2019-10-14</h2> </code></pre><ul>
<li>I looked in the database to find authors that had &ldquo;|&rdquo; in them:</li>
</ul>
<pre><code>dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%';
text_value | resource_id
----------------------------------+-------------
Anandajayasekeram, P.|Puskur, R. | 157
Morales, J.|Renner, I. | 22779
Zahid, A.|Haque, M.A. | 25492
(3 rows)
</code></pre><ul>
<li>Then I found their handles and corrected them, for example:</li>
</ul>
<pre><code>dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2;
handle
-----------
10568/129
(1 row)
</code></pre><ul>
<li>So I'm still not sure where these weird authors in the &ldquo;Top Author&rdquo; stats are coming from</li>
</ul>
<h2 id="20191014">2019-10-14</h2>
<ul> <ul>
<li>I talked to Peter about the Bioversity items and he said that we should add the institutional authors back to <code>dc.contributor.author</code>, because I had moved them to <code>cg.contributor.affiliation</code> <li>I talked to Peter about the Bioversity items and he said that we should add the institutional authors back to <code>dc.contributor.author</code>, because I had moved them to <code>cg.contributor.affiliation</code>
<ul> <ul>
<li>Otherwise he said the data looks good</li> <li>Otherwise he said the data looks good</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-15">2019-10-15</h2> </ul>
<h2 id="20191015">2019-10-15</h2>
<ul> <ul>
<li><p>I did a test export / import of the Bioversity migration items on DSpace Test</p> <li>I did a test export / import of the Bioversity migration items on DSpace Test
<ul> <ul>
<li><p>First export them:</p> <li>First export them:</li>
</ul>
</li>
</ul>
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m' <pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m'
$ mkdir 2019-10-15-Bioversity $ mkdir 2019-10-15-Bioversity
$ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-10-15-Bioversity $ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-10-15-Bioversity
$ sed -i '/&lt;dcvalue element=&quot;identifier&quot; qualifier=&quot;uri&quot;&gt;/d' 2019-10-15-Bioversity/*/dublin_core.xml $ sed -i '/&lt;dcvalue element=&quot;identifier&quot; qualifier=&quot;uri&quot;&gt;/d' 2019-10-15-Bioversity/*/dublin_core.xml
</code></pre></li> </code></pre><ul>
</ul></li> <li>It's really stupid, but for some reason the handles are included even though I specified the <code>-m</code> option, so after the export I removed the <code>dc.identifier.uri</code> metadata values from the items</li>
<li>Then I imported a test subset of them in my local test environment:</li>
<li><p>It&rsquo;s really stupid, but for some reason the handles are included even though I specified the <code>-m</code> option, so after the export I removed the <code>dc.identifier.uri</code> metadata values from the items</p></li> </ul>
<li><p>Then I imported a test subset of them in my local test environment:</p>
<pre><code>$ ~/dspace/bin/dspace import -a -c 10568/104049 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map -s /tmp/2019-10-15-Bioversity <pre><code>$ ~/dspace/bin/dspace import -a -c 10568/104049 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map -s /tmp/2019-10-15-Bioversity
</code></pre></li> </code></pre><ul>
<li>I had forgotten (again) that the <code>dspace export</code> command doesn't preserve collection ownership or mappings, so I will have to create a temporary collection on CGSpace to import these to, then do the mappings again after import&hellip;</li>
<li><p>I had forgotten (again) that the <code>dspace export</code> command doesn&rsquo;t preserve collection ownership or mappings, so I will have to create a temporary collection on CGSpace to import these to, then do the mappings again after import&hellip;</p></li> <li>On CGSpace I will increase the RAM of the command line Java process for good luck before import&hellip;</li>
</ul>
<li><p>On CGSpace I will increase the RAM of the command line Java process for good luck before import&hellip;</p>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map -s 2019-10-15-Bioversity $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map -s 2019-10-15-Bioversity
</code></pre></li> </code></pre><ul>
<li>After importing the 1,367 items I re-exported the metadata, changed the owning collections to those based on their type, then re-imported them</li>
<li><p>After importing the 1,367 items I re-exported the metadata, changed the owning collections to those based on their type, then re-imported them</p></li>
</ul> </ul>
<h2 id="20191021">2019-10-21</h2>
<h2 id="2019-10-21">2019-10-21</h2>
<ul> <ul>
<li>Re-sync the DSpace Test database and assetstore with CGSpace</li> <li>Re-sync the DSpace Test database and assetstore with CGSpace</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<h2 id="20191024">2019-10-24</h2>
<h2 id="2019-10-24">2019-10-24</h2>
<ul> <ul>
<li>Create a test user for Mohammad Salem to test depositing from MEL to DSpace Test, as the last one I had created in 2019-08 was cleared when we re-syncronized DSpace Test with CGSpace recently.</li> <li>Create a test user for Mohammad Salem to test depositing from MEL to DSpace Test, as the last one I had created in 2019-08 was cleared when we re-syncronized DSpace Test with CGSpace recently.</li>
</ul> </ul>
<h2 id="20191025">2019-10-25</h2>
<h2 id="2019-10-25">2019-10-25</h2>
<ul> <ul>
<li>Give a presentationa (via WebEx) about open source software to the ILRI Open Access Week <li>Give a presentationa (via WebEx) about open source software to the ILRI Open Access Week
<ul> <ul>
<li>The title was <em>Making ILRI code open: Software as an International Public Good</em></li> <li>The title was <em>Making ILRI code open: Software as an International Public Good</em></li>
<li>It is available on CGSpace: <a href="https://hdl.handle.net/10568/105514">https://hdl.handle.net/10568/105514</a></li> <li>It is available on CGSpace: <a href="https://hdl.handle.net/10568/105514">https://hdl.handle.net/10568/105514</a></li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-28">2019-10-28</h2> </ul>
<h2 id="20191028">2019-10-28</h2>
<ul> <ul>
<li>Move the CGSpace CG Core v2 notes from a <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">GitHub Gist</a> to a <a href="/cgspace-notes/cgspace-cgcorev2-migration/">page</a> on this site for archive and searchability sake</li> <li>Move the CGSpace CG Core v2 notes from a <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">GitHub Gist</a> to a <a href="/cgspace-notes/cgspace-cgcorev2-migration/">page</a> on this site for archive and searchability sake</li>
<li>Work on the CG Core v2 implementation testing <li>Work on the CG Core v2 implementation testing
<ul> <ul>
<li>I noticed that the page title is messed up on the item view, and after over an hour of troubleshooting it I couldn&rsquo;t figure out why</li> <li>I noticed that the page title is messed up on the item view, and after over an hour of troubleshooting it I couldn't figure out why</li>
<li>It seems to be because the <code>dc.title</code><code>dcterms.title</code> modifications cause the title metadata to disappear from DRI&rsquo;s <code>&lt;pageMeta&gt;</code> and therefore the title is not accessible to the XSL transformation</li> <li>It seems to be because the <code>dc.title</code><code>dcterms.title</code> modifications cause the title metadata to disappear from DRI's <code>&lt;pageMeta&gt;</code> and therefore the title is not accessible to the XSL transformation</li>
<li>Also, I noticed a few places in the Java code where <code>dc.title</code> is hard coded so I think this might be one of the fields that we just assume DSpace relies on internally</li> <li>Also, I noticed a few places in the Java code where <code>dc.title</code> is hard coded so I think this might be one of the fields that we just assume DSpace relies on internally</li>
<li>I will revert all changes to <code>dc.title</code> and <code>dc.title.alternative</code></li> <li>I will revert all changes to <code>dc.title</code> and <code>dc.title.alternative</code></li>
<li>TODO: there are similar issues with the <code>citation_author</code> metadata element missing from DRI, so I might have to revert those changes too</li> <li>TODO: there are similar issues with the <code>citation_author</code> metadata element missing from DRI, so I might have to revert those changes too</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-10-29">2019-10-29</h2> </ul>
<h2 id="20191029">2019-10-29</h2>
<ul> <ul>
<li>After more digging in the source I found out why the <code>dcterms.title</code> and <code>dcterms.creator</code> fields are not present in the DRI <code>pageMeta</code>&hellip; <li>After more digging in the source I found out why the <code>dcterms.title</code> and <code>dcterms.creator</code> fields are not present in the DRI <code>pageMeta</code>&hellip;
<ul> <ul>
<li>The <code>pageMeta</code> element is constructed in <code>dspace-xmlui/src/main/java/org/dspace/app/xmlui/wing/IncludePageMeta.java</code> and the code does not consider any other schemas besides DC</li> <li>The <code>pageMeta</code> element is constructed in <code>dspace-xmlui/src/main/java/org/dspace/app/xmlui/wing/IncludePageMeta.java</code> and the code does not consider any other schemas besides DC</li>
<li>I moved title and creator back to the original DC fields and then everything was working as expected in the pageMeta, so I guess we cannot use these in DCTERMS either!</li> <li>I moved title and creator back to the original DC fields and then everything was working as expected in the pageMeta, so I guess we cannot use these in DCTERMS either!</li>
</ul></li> </ul>
</li>
<li>Assist Maria from Bioversity with community and collection subscriptions</li> <li>Assist Maria from Bioversity with community and collection subscriptions</li>
</ul> </ul>
<!-- raw HTML omitted -->
<!-- vim: set sw=2 ts=2: -->

View File

@ -8,62 +8,54 @@
<meta property="og:title" content="November, 2019" /> <meta property="og:title" content="November, 2019" />
<meta property="og:description" content="2019-11-04 <meta property="og:description" content="2019-11-04
Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs: I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:
# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
So 4.6 million from XMLUI and another 1.2 million from API requests So 4.6 million from XMLUI and another 1.2 million from API requests
Let&#39;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):
Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):
# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-11/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-11/" />
<meta property="article:published_time" content="2019-11-04T12:20:30+02:00" /> <meta property="article:published_time" content="2019-11-04T12:20:30+02:00" />
<meta property="article:modified_time" content="2019-11-26T15:53:57+02:00" /> <meta property="article:modified_time" content="2019-11-27T14:56:00+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2019"/> <meta name="twitter:title" content="November, 2019"/>
<meta name="twitter:description" content="2019-11-04 <meta name="twitter:description" content="2019-11-04
Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs: I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:
# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
So 4.6 million from XMLUI and another 1.2 million from API requests So 4.6 million from XMLUI and another 1.2 million from API requests
Let&#39;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):
Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):
# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
"/> "/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -73,9 +65,9 @@ Let&rsquo;s see how many of the REST API requests were for bitstreams (because t
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "November, 2019", "headline": "November, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-11\/", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-11\/",
"wordCount": "3381", "wordCount": "3457",
"datePublished": "2019-11-04T12:20:30+02:00", "datePublished": "2019-11-04T12:20:30+02:00",
"dateModified": "2019-11-26T15:53:57+02:00", "dateModified": "2019-11-27T14:56:00+02:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -144,265 +136,226 @@ Let&rsquo;s see how many of the REST API requests were for bitstreams (because t
</p> </p>
</header> </header>
<h2 id="2019-11-04">2019-11-04</h2> <h2 id="20191104">2019-11-04</h2>
<ul> <ul>
<li><p>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics</p> <li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul> <ul>
<li><p>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</p> <li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
</code></pre></li> </code></pre><ul>
</ul></li> <li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let's see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
<li><p>So 4.6 million from XMLUI and another 1.2 million from API requests</p></li> </ul>
<li><p>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</p>
<pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
</code></pre></li> </code></pre><ul>
<li>The types of requests in the access logs are (by lazily extracting the sixth field in the nginx log)</li>
</ul> </ul>
<ul>
<li><p>The types of requests in the access logs are (by lazily extracting the sixth field in the nginx log)</p>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | awk '{print $6}' | sed 's/&quot;//' | sort | uniq -c | sort -n <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | awk '{print $6}' | sed 's/&quot;//' | sort | uniq -c | sort -n
1 PUT 1 PUT
8 PROPFIND 8 PROPFIND
283 OPTIONS 283 OPTIONS
30102 POST 30102 POST
46581 HEAD 46581 HEAD
4594967 GET 4594967 GET
</code></pre></li> </code></pre><ul>
<li>Two very active IPs are 34.224.4.16 and 34.234.204.152, which made over 360,000 requests in October:</li>
<li><p>Two very active IPs are 34.224.4.16 and 34.234.204.152, which made over 360,000 requests in October:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E '(34\.224\.4\.16|34\.234\.204\.152)' <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E '(34\.224\.4\.16|34\.234\.204\.152)'
365288 365288
</code></pre></li> </code></pre><ul>
<li>Their user agent is one I've never seen before:</li>
<li><p>Their user agent is one I&rsquo;ve never seen before:</p> </ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) <pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
</code></pre></li> </code></pre><ul>
<li>Most of them seem to be to community or collection discover and browse results pages like <code>/handle/10568/103/discover</code>:</li>
<li><p>Most of them seem to be to community or collection discover and browse results pages like <code>/handle/10568/103/discover</code>:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep Amazonbot | grep -o -E &quot;GET /(bitstream|discover|handle)&quot; | sort | uniq -c <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep Amazonbot | grep -o -E &quot;GET /(bitstream|discover|handle)&quot; | sort | uniq -c
6566 GET /bitstream 6566 GET /bitstream
351928 GET /handle 351928 GET /handle
# zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep Amazonbot | grep -E &quot;GET /(bitstream|discover|handle)&quot; | grep -c discover # zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep Amazonbot | grep -E &quot;GET /(bitstream|discover|handle)&quot; | grep -c discover
214209 214209
# zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep Amazonbot | grep -E &quot;GET /(bitstream|discover|handle)&quot; | grep -c browse # zcat --force /var/log/nginx/*access.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep Amazonbot | grep -E &quot;GET /(bitstream|discover|handle)&quot; | grep -c browse
86874 86874
</code></pre></li> </code></pre><ul>
<li>As far as I can tell, none of their requests are counted in the Solr statistics:</li>
<li><p>As far as I can tell, none of their requests are counted in the Solr statistics:</p> </ul>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=(ip%3A34.224.4.16+OR+ip%3A34.234.204.152)&amp;rows=0&amp;wt=json&amp;indent=true' <pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=(ip%3A34.224.4.16+OR+ip%3A34.234.204.152)&amp;rows=0&amp;wt=json&amp;indent=true'
</code></pre></li> </code></pre><ul>
<li>Still, those requests are CPU intensive so I will add their user agent to the &ldquo;badbots&rdquo; rate limiting in nginx to reduce the impact on server load</li>
<li><p>Still, those requests are CPU intensive so I will add their user agent to the &ldquo;badbots&rdquo; rate limiting in nginx to reduce the impact on server load</p></li> <li>After deploying it I checked by setting my user agent to Amazonbot and making a few requests (which were denied with HTTP 503):</li>
</ul>
<li><p>After deploying it I checked by setting my user agent to Amazonbot and making a few requests (which were denied with HTTP 503):</p>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1/discover' User-Agent:&quot;Amazonbot/0.1&quot; <pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1/discover' User-Agent:&quot;Amazonbot/0.1&quot;
</code></pre></li> </code></pre><ul>
<li>On the topic of spiders, I have been wanting to update DSpace's default list of spiders in <code>config/spiders/agents</code>, perhaps by dropping a new list in from <a href="https://github.com/atmire/COUNTER-Robots">Atmire's COUNTER-Robots</a> project
<li><p>On the topic of spiders, I have been wanting to update DSpace&rsquo;s default list of spiders in <code>config/spiders/agents</code>, perhaps by dropping a new list in from <a href="https://github.com/atmire/COUNTER-Robots">Atmire&rsquo;s COUNTER-Robots</a> project</p>
<ul> <ul>
<li>First I checked for a user agent that is in COUNTER-Robots, but NOT in the current <code>dspace/config/spiders/example</code> list</li> <li>First I checked for a user agent that is in COUNTER-Robots, but NOT in the current <code>dspace/config/spiders/example</code> list</li>
<li>Then I made some item and bitstream requests on DSpace Test using that user agent:</li>
<li><p>Then I made some item and bitstream requests on DSpace Test using that user agent:</p> </ul>
</li>
</ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;iskanie&quot; <pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;iskanie&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;iskanie&quot; $ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;iskanie&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/bitstream/handle/10568/105487/csl_Crane_oct2019.pptx?sequence=1&amp;isAllowed=y' User-Agent:&quot;iskanie&quot; $ http --print Hh 'https://dspacetest.cgiar.org/bitstream/handle/10568/105487/csl_Crane_oct2019.pptx?sequence=1&amp;isAllowed=y' User-Agent:&quot;iskanie&quot;
</code></pre></li> </code></pre><ul>
</ul></li> <li>A bit later I checked Solr and found three requests from my IP with that user agent this month:</li>
</ul>
<li><p>A bit later I checked Solr and found three requests from my IP with that user agent this month:</p>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=ip:73.178.9.24+AND+userAgent:iskanie&amp;fq=dateYearMonth%3A2019-11&amp;rows=0' <pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=ip:73.178.9.24+AND+userAgent:iskanie&amp;fq=dateYearMonth%3A2019-11&amp;rows=0'
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;response&gt; &lt;response&gt;
&lt;lst name=&quot;responseHeader&quot;&gt;&lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;&lt;int name=&quot;QTime&quot;&gt;1&lt;/int&gt;&lt;lst name=&quot;params&quot;&gt;&lt;str name=&quot;q&quot;&gt;ip:73.178.9.24 AND userAgent:iskanie&lt;/str&gt;&lt;str name=&quot;fq&quot;&gt;dateYearMonth:2019-11&lt;/str&gt;&lt;str name=&quot;rows&quot;&gt;0&lt;/str&gt;&lt;/lst&gt;&lt;/lst&gt;&lt;result name=&quot;response&quot; numFound=&quot;3&quot; start=&quot;0&quot;&gt;&lt;/result&gt; &lt;lst name=&quot;responseHeader&quot;&gt;&lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;&lt;int name=&quot;QTime&quot;&gt;1&lt;/int&gt;&lt;lst name=&quot;params&quot;&gt;&lt;str name=&quot;q&quot;&gt;ip:73.178.9.24 AND userAgent:iskanie&lt;/str&gt;&lt;str name=&quot;fq&quot;&gt;dateYearMonth:2019-11&lt;/str&gt;&lt;str name=&quot;rows&quot;&gt;0&lt;/str&gt;&lt;/lst&gt;&lt;/lst&gt;&lt;result name=&quot;response&quot; numFound=&quot;3&quot; start=&quot;0&quot;&gt;&lt;/result&gt;
&lt;/response&gt; &lt;/response&gt;
</code></pre></li> </code></pre><ul>
<li>Now I want to make similar requests with a user agent that is included in DSpace's current user agent list:</li>
<li><p>Now I want to make similar requests with a user agent that is included in DSpace&rsquo;s current user agent list:</p> </ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;celestial&quot; <pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;celestial&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;celestial&quot; $ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;celestial&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/bitstream/handle/10568/105487/csl_Crane_oct2019.pptx?sequence=1&amp;isAllowed=y' User-Agent:&quot;celestial&quot; $ http --print Hh 'https://dspacetest.cgiar.org/bitstream/handle/10568/105487/csl_Crane_oct2019.pptx?sequence=1&amp;isAllowed=y' User-Agent:&quot;celestial&quot;
</code></pre></li> </code></pre><ul>
<li>After twenty minutes I didn't see any requests in Solr, so I assume they did not get logged because they matched a bot list&hellip;
<li><p>After twenty minutes I didn&rsquo;t see any requests in Solr, so I assume they did not get logged because they matched a bot list&hellip;</p>
<ul> <ul>
<li><p>What&rsquo;s strange is that the Solr spider agent configuration in <code>dspace/config/modules/solr-statistics.cfg</code> points to a file that doesn&rsquo;t exist&hellip;</p> <li>What's strange is that the Solr spider agent configuration in <code>dspace/config/modules/solr-statistics.cfg</code> points to a file that doesn't exist&hellip;</li>
</ul>
</li>
</ul>
<pre><code>spider.agentregex.regexfile = ${dspace.dir}/config/spiders/Bots-2013-03.txt <pre><code>spider.agentregex.regexfile = ${dspace.dir}/config/spiders/Bots-2013-03.txt
</code></pre></li> </code></pre><ul>
</ul></li> <li>Apparently that is part of Atmire's CUA, despite being in a standard DSpace configuration file&hellip;</li>
<li>I tried with some other garbage user agents like &ldquo;fuuuualan&rdquo; and they were visible in Solr
<li><p>Apparently that is part of Atmire&rsquo;s CUA, despite being in a standard DSpace configuration file&hellip;</p></li>
<li><p>I tried with some other garbage user agents like &ldquo;fuuuualan&rdquo; and they were visible in Solr</p>
<ul> <ul>
<li>Now I want to try adding &ldquo;iskanie&rdquo; and &ldquo;fuuuualan&rdquo; to the list of spider regexes in <code>dspace/config/spiders/example</code> and then try to use DSpace&rsquo;s &ldquo;mark spiders&rdquo; feature to change them to &ldquo;isBot:true&rdquo; in Solr</li> <li>Now I want to try adding &ldquo;iskanie&rdquo; and &ldquo;fuuuualan&rdquo; to the list of spider regexes in <code>dspace/config/spiders/example</code> and then try to use DSpace's &ldquo;mark spiders&rdquo; feature to change them to &ldquo;isBot:true&rdquo; in Solr</li>
<li>I restarted Tomcat and ran <code>dspace stats-util -m</code> and it did some stuff for awhile, but I still don&rsquo;t see any items in Solr with <code>isBot:true</code></li> <li>I restarted Tomcat and ran <code>dspace stats-util -m</code> and it did some stuff for awhile, but I still don't see any items in Solr with <code>isBot:true</code></li>
<li>According to <code>dspace-api/src/main/java/org/dspace/statistics/util/SpiderDetector.java</code> the patterns for user agents are loaded from any file in the <code>config/spiders/agents</code> directory</li> <li>According to <code>dspace-api/src/main/java/org/dspace/statistics/util/SpiderDetector.java</code> the patterns for user agents are loaded from any file in the <code>config/spiders/agents</code> directory</li>
<li>I downloaded the COUNTER-Robots list to DSpace Test and overwrote the example file, then ran <code>dspace stats-util -m</code> and still there were no new items marked as being bots in Solr, so I think there is still something wrong</li> <li>I downloaded the COUNTER-Robots list to DSpace Test and overwrote the example file, then ran <code>dspace stats-util -m</code> and still there were no new items marked as being bots in Solr, so I think there is still something wrong</li>
<li>Jesus, the code in <code>./dspace-api/src/main/java/org/dspace/statistics/util/StatisticsClient.java</code> says that <code>stats-util -m</code> marks spider requests by their IPs, not by their user agents&hellip; WTF:</li>
<li><p>Jesus, the code in <code>./dspace-api/src/main/java/org/dspace/statistics/util/StatisticsClient.java</code> says that <code>stats-util -m</code> marks spider requests by their IPs, not by their user agents&hellip; WTF:</p> </ul>
</li>
</ul>
<pre><code>else if (line.hasOption('m')) <pre><code>else if (line.hasOption('m'))
{ {
SolrLogger.markRobotsByIP(); SolrLogger.markRobotsByIP();
} }
</code></pre></li> </code></pre><ul>
</ul></li> <li>WTF again, there is actually a function called <code>markRobotByUserAgent()</code> that is never called anywhere!
<li><p>WTF again, there is actually a function called <code>markRobotByUserAgent()</code> that is never called anywhere!</p>
<ul> <ul>
<li>It appears to be unimplemented&hellip;</li> <li>It appears to be unimplemented&hellip;</li>
<li>I sent a message to the dspace-tech mailing list to ask if I should file an issue</li> <li>I sent a message to the dspace-tech mailing list to ask if I should file an issue</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-05">2019-11-05</h2> </ul>
<h2 id="20191105">2019-11-05</h2>
<ul> <ul>
<li><p>I added &ldquo;alanfuu2&rdquo; to the example spiders file, restarted Tomcat, then made two requests to DSpace Test:</p> <li>I added &ldquo;alanfuu2&rdquo; to the example spiders file, restarted Tomcat, then made two requests to DSpace Test:</li>
</ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;alanfuuu1&quot; <pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;alanfuuu1&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;alanfuuu2&quot; $ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;alanfuuu2&quot;
</code></pre></li> </code></pre><ul>
<li>After committing the changes in Solr I saw one request for &ldquo;alanfuu1&rdquo; and no requests for &ldquo;alanfuu2&rdquo;:</li>
<li><p>After committing the changes in Solr I saw one request for &ldquo;alanfuu1&rdquo; and no requests for &ldquo;alanfuu2&rdquo;:</p> </ul>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/update?commit=true' <pre><code>$ http --print b 'http://localhost:8081/solr/statistics/update?commit=true'
$ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:alanfuuu1&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:alanfuuu1&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt;
$ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:alanfuuu2&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:alanfuuu2&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;0&quot; start=&quot;0&quot;/&gt; &lt;result name=&quot;response&quot; numFound=&quot;0&quot; start=&quot;0&quot;/&gt;
</code></pre></li> </code></pre><ul>
<li>So basically it seems like a win to update the example file with the latest one from Atmire's COUNTER-Robots list
<li><p>So basically it seems like a win to update the example file with the latest one from Atmire&rsquo;s COUNTER-Robots list</p>
<ul> <ul>
<li>Even though the &ldquo;mark by user agent&rdquo; function is not working (see email to dspace-tech mailing list) DSpace will still not log Solr events from these user agents</li> <li>Even though the &ldquo;mark by user agent&rdquo; function is not working (see email to dspace-tech mailing list) DSpace will still not log Solr events from these user agents</li>
</ul></li> </ul>
</li>
<li><p>I&rsquo;m curious how the special character matching is in Solr, so I will test two requests: one with &ldquo;www.gnip.com&rdquo; which is in the spider list, and one with &ldquo;www.gnyp.com&rdquo; which isn&rsquo;t:</p> <li>I'm curious how the special character matching is in Solr, so I will test two requests: one with &ldquo;<a href="http://www.gnip.com">www.gnip.com</a>&rdquo; which is in the spider list, and one with &ldquo;<a href="http://www.gnyp.com">www.gnyp.com</a>&rdquo; which isn't:</li>
</ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;www.gnip.com&quot; <pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;www.gnip.com&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;www.gnyp.com&quot; $ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;www.gnyp.com&quot;
</code></pre></li> </code></pre><ul>
<li>Then commit changes to Solr so we don't have to wait:</li>
<li><p>Then commit changes to Solr so we don&rsquo;t have to wait:</p> </ul>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/update?commit=true' <pre><code>$ http --print b 'http://localhost:8081/solr/statistics/update?commit=true'
$ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:www.gnip.com&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:www.gnip.com&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;0&quot; start=&quot;0&quot;/&gt; &lt;result name=&quot;response&quot; numFound=&quot;0&quot; start=&quot;0&quot;/&gt;
$ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:www.gnyp.com&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:www.gnyp.com&amp;fq=dateYearMonth%3A2019-11' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt;
</code></pre></li> </code></pre><ul>
<li>So the blocking seems to be working because &ldquo;www.gnip.com&rdquo; is one of the new patterns added to the spiders file&hellip;</li>
<li><p>So the blocking seems to be working because &ldquo;www.gnip.com&rdquo; is one of the new patterns added to the spiders file&hellip;</p></li>
</ul> </ul>
<h2 id="20191107">2019-11-07</h2>
<h2 id="2019-11-07">2019-11-07</h2>
<ul> <ul>
<li>CCAFS finally confirmed that they do indeed need the confusing new project tag that looks like a duplicate <li>CCAFS finally confirmed that they do indeed need the confusing new project tag that looks like a duplicate
<ul> <ul>
<li>They had proposed a batch of new tags in 2019-09 and we never merged them due to this uncertainty</li> <li>They had proposed a batch of new tags in 2019-09 and we never merged them due to this uncertainty</li>
<li>I have now merged the changes in to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/432">#432</a>)</li> <li>I have now merged the changes in to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/432">#432</a>)</li>
</ul></li> </ul>
</li>
<li>I am reconsidering the move of <code>cg.identifier.dataurl</code> to <code>cg.hasMetadata</code> in CG Core v2 <li>I am reconsidering the move of <code>cg.identifier.dataurl</code> to <code>cg.hasMetadata</code> in CG Core v2
<ul> <ul>
<li>The values of this field are mostly links to data sets on Dataverse and partner sites</li> <li>The values of this field are mostly links to data sets on Dataverse and partner sites</li>
<li>I opened an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/10">issue on GitHub</a> to ask Marie-Angelique for clarification</li> <li>I opened an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/10">issue on GitHub</a> to ask Marie-Angelique for clarification</li>
</ul></li> </ul>
</li>
<li><p>Looking into CGSpace statistics again</p> <li>Looking into CGSpace statistics again
<ul> <ul>
<li><p>I searched for hits in Solr from the BUbiNG bot and found 63,000 in the <code>statistics-2018</code> core:</p> <li>I searched for hits in Solr from the BUbiNG bot and found 63,000 in the <code>statistics-2018</code> core:</li>
</ul>
</li>
</ul>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:BUbiNG*' | xmllint --format - | grep numFound <pre><code>$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:BUbiNG*' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;62944&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;62944&quot; start=&quot;0&quot;&gt;
</code></pre></li> </code></pre><ul>
<li>Similar for com.plumanalytics, Grammarly, and ltx71!</li>
<li><p>Similar for com.plumanalytics, Grammarly, and ltx71!</p> </ul>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent: <pre><code>$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:
*com.plumanalytics*' | xmllint --format - | grep numFound *com.plumanalytics*' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;28256&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;28256&quot; start=&quot;0&quot;&gt;
$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:*Grammarly*' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:*Grammarly*' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;6288&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;6288&quot; start=&quot;0&quot;&gt;
$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:*ltx71*' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:*ltx71*' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;105663&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;105663&quot; start=&quot;0&quot;&gt;
</code></pre></li> </code></pre><ul>
</ul></li> <li>Deleting these seems to work, for example the 105,000 ltx71 records from 2018:</li>
</ul>
<li><p>Deleting these seems to work, for example the 105,000 ltx71 records from 2018:</p>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics-2018/update?stream.body=&lt;delete&gt;&lt;query&gt;userAgent:*ltx71*&lt;/query&gt;&lt;query&gt;type:0&lt;/query&gt;&lt;/delete&gt;&amp;commit=true' <pre><code>$ http --print b 'http://localhost:8081/solr/statistics-2018/update?stream.body=&lt;delete&gt;&lt;query&gt;userAgent:*ltx71*&lt;/query&gt;&lt;query&gt;type:0&lt;/query&gt;&lt;/delete&gt;&amp;commit=true'
$ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:*ltx71*' | xmllint --format - | grep numFound $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&amp;facet.field=ip&amp;facet.mincount=1&amp;type:0&amp;q=userAgent:*ltx71*' | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;0&quot; start=&quot;0&quot;/&gt; &lt;result name=&quot;response&quot; numFound=&quot;0&quot; start=&quot;0&quot;/&gt;
</code></pre></li> </code></pre><ul>
<li>I wrote a quick bash script to check all these user agents against the CGSpace Solr statistics cores
<li><p>I wrote a quick bash script to check all these user agents against the CGSpace Solr statistics cores</p>
<ul> <ul>
<li>For years 2010 until 2019 there are 1.6 million hits from these spider user agents</li> <li>For years 2010 until 2019 there are 1.6 million hits from these spider user agents</li>
<li>For 2019 alone there are 740,000, over half of which come from Unpaywall!</li> <li>For 2019 alone there are 740,000, over half of which come from Unpaywall!</li>
<li>Looking at the facets I see there were about 200,000 hits from Unpaywall in 2019-10:</li>
<li><p>Looking at the facets I see there were about 200,000 hits from Unpaywall in 2019-10:</p> </ul>
</li>
</ul>
<pre><code>$ curl -s 'http://localhost:8081/solr/statistics/select?facet=true&amp;facet.field=dateYearMonth&amp;facet.mincount=1&amp;facet.offset=0&amp;facet.limit= <pre><code>$ curl -s 'http://localhost:8081/solr/statistics/select?facet=true&amp;facet.field=dateYearMonth&amp;facet.mincount=1&amp;facet.offset=0&amp;facet.limit=
12&amp;q=userAgent:*Unpaywall*' | xmllint --format - | less 12&amp;q=userAgent:*Unpaywall*' | xmllint --format - | less
... ...
&lt;lst name=&quot;facet_counts&quot;&gt; &lt;lst name=&quot;facet_counts&quot;&gt;
&lt;lst name=&quot;facet_queries&quot;/&gt; &lt;lst name=&quot;facet_queries&quot;/&gt;
&lt;lst name=&quot;facet_fields&quot;&gt; &lt;lst name=&quot;facet_fields&quot;&gt;
&lt;lst name=&quot;dateYearMonth&quot;&gt; &lt;lst name=&quot;dateYearMonth&quot;&gt;
&lt;int name=&quot;2019-10&quot;&gt;198624&lt;/int&gt; &lt;int name=&quot;2019-10&quot;&gt;198624&lt;/int&gt;
&lt;int name=&quot;2019-05&quot;&gt;88422&lt;/int&gt; &lt;int name=&quot;2019-05&quot;&gt;88422&lt;/int&gt;
&lt;int name=&quot;2019-06&quot;&gt;79911&lt;/int&gt; &lt;int name=&quot;2019-06&quot;&gt;79911&lt;/int&gt;
&lt;int name=&quot;2019-09&quot;&gt;67065&lt;/int&gt; &lt;int name=&quot;2019-09&quot;&gt;67065&lt;/int&gt;
&lt;int name=&quot;2019-07&quot;&gt;39026&lt;/int&gt; &lt;int name=&quot;2019-07&quot;&gt;39026&lt;/int&gt;
&lt;int name=&quot;2019-08&quot;&gt;36889&lt;/int&gt; &lt;int name=&quot;2019-08&quot;&gt;36889&lt;/int&gt;
&lt;int name=&quot;2019-04&quot;&gt;36512&lt;/int&gt; &lt;int name=&quot;2019-04&quot;&gt;36512&lt;/int&gt;
&lt;int name=&quot;2019-11&quot;&gt;760&lt;/int&gt; &lt;int name=&quot;2019-11&quot;&gt;760&lt;/int&gt;
&lt;/lst&gt; &lt;/lst&gt;
&lt;/lst&gt; &lt;/lst&gt;
</code></pre></li> </code></pre><ul>
</ul></li> <li>That answers Peter's question about why the stats jumped in October&hellip;</li>
<li><p>That answers Peter&rsquo;s question about why the stats jumped in October&hellip;</p></li>
</ul> </ul>
<h2 id="20191108">2019-11-08</h2>
<h2 id="2019-11-08">2019-11-08</h2>
<ul> <ul>
<li>I saw a bunch of user agents that have the literal string <code>User-Agent</code> in their user agent HTTP header, for example: <li>I saw a bunch of user agents that have the literal string <code>User-Agent</code> in their user agent HTTP header, for example:
<ul> <ul>
<li><code>User-Agent: Drupal (+http://drupal.org/)</code></li> <li><code>User-Agent: Drupal (+http://drupal.org/)</code></li>
<li><code>User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31</code></li> <li><code>User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31</code></li>
@ -410,127 +363,106 @@ $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&a
<li><code>User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)</code></li> <li><code>User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)</code></li>
<li><code>User-Agent:User-Agent:Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB7.5; .NET4.0C)IKU/6.7.6.12189;IKUCID/IKU;IKU/6.7.6.12189;IKUCID/IKU;</code></li> <li><code>User-Agent:User-Agent:Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB7.5; .NET4.0C)IKU/6.7.6.12189;IKUCID/IKU;IKU/6.7.6.12189;IKUCID/IKU;</code></li>
<li><code>User-Agent:Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) IKU/7.0.5.9226;IKUCID/IKU;</code></li> <li><code>User-Agent:Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) IKU/7.0.5.9226;IKUCID/IKU;</code></li>
</ul></li> </ul>
</li>
<li>I filed <a href="https://github.com/atmire/COUNTER-Robots/issues/27">an issue</a> on the COUNTER-Robots project to see if they agree to add <code>User-Agent:</code> to the list of robot user agents</li> <li>I filed <a href="https://github.com/atmire/COUNTER-Robots/issues/27">an issue</a> on the COUNTER-Robots project to see if they agree to add <code>User-Agent:</code> to the list of robot user agents</li>
</ul> </ul>
<h2 id="20191109">2019-11-09</h2>
<h2 id="2019-11-09">2019-11-09</h2>
<ul> <ul>
<li>Deploy the latest <code>5_x-prod</code> branch on CGSpace (linode19) <li>Deploy the latest <code>5_x-prod</code> branch on CGSpace (linode19)
<ul> <ul>
<li>This includes the updated CCAFS phase II project tags and the updated spider user agents</li> <li>This includes the updated CCAFS phase II project tags and the updated spider user agents</li>
</ul></li> </ul>
</li>
<li>Run all system updates on CGSpace and reboot the server <li>Run all system updates on CGSpace and reboot the server
<ul> <ul>
<li>After rebooting it seems that all Solr statistics cores came back up fine&hellip;</li> <li>After rebooting it seems that all Solr statistics cores came back up fine&hellip;</li>
</ul></li> </ul>
</li>
<li><p>I did some work to clean up my bot processing script and removed about 2 million hits from the statistics cores on CGSpace</p> <li>I did some work to clean up my bot processing script and removed about 2 million hits from the statistics cores on CGSpace
<ul> <ul>
<li>The script is called <code>check-spider-hits.sh</code></li> <li>The script is called <code>check-spider-hits.sh</code></li>
<li>After a bunch of tests and checks I ran it for each statistics shard like so:</li>
<li><p>After a bunch of tests and checks I ran it for each statistics shard like so:</p> </ul>
</li>
</ul>
<pre><code>$ for shard in statistics statistics-2018 statistics-2017 statistics-2016 statistics-2015 stat <pre><code>$ for shard in statistics statistics-2018 statistics-2017 statistics-2016 statistics-2015 stat
istics-2014 statistics-2013 statistics-2012 statistics-2011 statistics-2010; do ./check-spider-hits.sh -s $shard -p yes; done istics-2014 statistics-2013 statistics-2012 statistics-2011 statistics-2010; do ./check-spider-hits.sh -s $shard -p yes; done
</code></pre></li> </code></pre><ul>
</ul></li> <li>Open a <a href="https://github.com/atmire/COUNTER-Robots/pull/28">pull request</a> against COUNTER-Robots to remove unnecessary escaping of dashes</li>
<li><p>Open a <a href="https://github.com/atmire/COUNTER-Robots/pull/28">pull request</a> against COUNTER-Robots to remove unnecessary escaping of dashes</p></li>
</ul> </ul>
<h2 id="20191112">2019-11-12</h2>
<h2 id="2019-11-12">2019-11-12</h2>
<ul> <ul>
<li>Udana and Chandima emailed me to ask why <a href="https://hdl.handle.net/10568/81236">one of their WLE items</a> that is mapped from IWMI only shows up in the IWMI &ldquo;department&rdquo; on the Altmetric dashboard <li>Udana and Chandima emailed me to ask why <a href="https://hdl.handle.net/10568/81236">one of their WLE items</a> that is mapped from IWMI only shows up in the IWMI &ldquo;department&rdquo; on the Altmetric dashboard
<ul> <ul>
<li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_16814&amp;q=Towards%20sustainable%20sanitation%20management">search in the IWMI department shows the item</a></li> <li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_16814&amp;q=Towards%20sustainable%20sanitation%20management">search in the IWMI department shows the item</a></li>
<li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_34494&amp;q=Towards%20sustainable%20sanitation%20management">search in the WLE department shows no results</a></li> <li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_34494&amp;q=Towards%20sustainable%20sanitation%20management">search in the WLE department shows no results</a></li>
<li>I emailed Altmetric support to ask for help</li> <li>I emailed Altmetric support to ask for help</li>
</ul></li> </ul>
</li>
<li>Also, while analysing this, I looked through some of the other top WLE items and fixed some metadata issues (adding <code>dc.rights</code>, fixing DOIs, adding ISSNs, etc) and noticed one issue with <a href="https://hdl.handle.net/10568/97087">an item</a> that has an Altmetric score for its Handle (lower) despite it having a correct DOI (with a higher score) <li>Also, while analysing this, I looked through some of the other top WLE items and fixed some metadata issues (adding <code>dc.rights</code>, fixing DOIs, adding ISSNs, etc) and noticed one issue with <a href="https://hdl.handle.net/10568/97087">an item</a> that has an Altmetric score for its Handle (lower) despite it having a correct DOI (with a higher score)
<ul> <ul>
<li>I tweeted the Handle to see if the score would get linked once Altmetric noticed it</li> <li>I tweeted the Handle to see if the score would get linked once Altmetric noticed it</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-13">2019-11-13</h2> </ul>
<h2 id="20191113">2019-11-13</h2>
<ul> <ul>
<li>The <a href="https://hdl.handle.net/10568/97087">item with a low Altmetric score for its Handle</a> that I tweeted yesterday still hasn&rsquo;t linked with the DOI&rsquo;s score <li>The <a href="https://hdl.handle.net/10568/97087">item with a low Altmetric score for its Handle</a> that I tweeted yesterday still hasn't linked with the DOI's score
<ul> <ul>
<li>I tweeted it again with the Handle and the DOI</li> <li>I tweeted it again with the Handle and the DOI</li>
</ul></li> </ul>
</li>
<li><p>Testing modifying some of the COUNTER-Robots patterns to use <code>[0-9]</code> instead of <code>\d</code> digit character type, as Solr&rsquo;s regex search can&rsquo;t use those</p> <li>Testing modifying some of the COUNTER-Robots patterns to use <code>[0-9]</code> instead of <code>\d</code> digit character type, as Solr's regex search can't use those</li>
</ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;Scrapoo/1&quot; <pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;Scrapoo/1&quot;
$ http &quot;http://localhost:8081/solr/statistics/update?commit=true&quot; $ http &quot;http://localhost:8081/solr/statistics/update?commit=true&quot;
$ http &quot;http://localhost:8081/solr/statistics/select?q=userAgent:Scrapoo*&quot; | xmllint --format - | grep numFound $ http &quot;http://localhost:8081/solr/statistics/select?q=userAgent:Scrapoo*&quot; | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt;
$ http &quot;http://localhost:8081/solr/statistics/select?q=userAgent:/Scrapoo\/[0-9]/&quot; | xmllint --format - | grep numFound $ http &quot;http://localhost:8081/solr/statistics/select?q=userAgent:/Scrapoo\/[0-9]/&quot; | xmllint --format - | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt; &lt;result name=&quot;response&quot; numFound=&quot;1&quot; start=&quot;0&quot;&gt;
</code></pre></li> </code></pre><ul>
<li>Nice, so searching with regex in Solr with <code>//</code> syntax works for those digits!</li>
<li><p>Nice, so searching with regex in Solr with <code>//</code> syntax works for those digits!</p></li> <li>I realized that it's easier to search Solr from curl via POST using this syntax:</li>
<li><p>I realized that it&rsquo;s easier to search Solr from curl via POST using this syntax:</p>
<pre><code>$ curl -s &quot;http://localhost:8081/solr/statistics/select&quot; -d &quot;q=userAgent:*Scrapoo*&amp;rows=0&quot;)
</code></pre></li>
<li><p>If the parameters include something like &ldquo;[0-9]&rdquo; then curl interprets it as a range and will make ten requests</p>
<ul>
<li><p>You can disable this using the <code>-g</code> option, but there are other benefits to searching with POST, for example it seems that I have less issues with escaping special parameters when using Solr&rsquo;s regex search:</p>
<pre><code>$ curl -s 'http://localhost:8081/solr/statistics/select' -d 'q=userAgent:/Postgenomic(\s|\+)v2/&amp;rows=2'
</code></pre></li>
</ul></li>
<li><p>I updated the <code>check-spider-hits.sh</code> script to use the POST syntax, and I&rsquo;m evaluating the feasability of including the regex search patterns from the spider agent file, as I had been filtering them out due to differences in PCRE and Solr regex syntax and issues with shell handling</p></li>
</ul> </ul>
<pre><code>$ curl -s &quot;http://localhost:8081/solr/statistics/select&quot; -d &quot;q=userAgent:*Scrapoo*&amp;rows=0&quot;)
<h2 id="2019-11-14">2019-11-14</h2> </code></pre><ul>
<li>If the parameters include something like &ldquo;[0-9]&rdquo; then curl interprets it as a range and will make ten requests
<ul>
<li>You can disable this using the <code>-g</code> option, but there are other benefits to searching with POST, for example it seems that I have less issues with escaping special parameters when using Solr's regex search:</li>
</ul>
</li>
</ul>
<pre><code>$ curl -s 'http://localhost:8081/solr/statistics/select' -d 'q=userAgent:/Postgenomic(\s|\+)v2/&amp;rows=2'
</code></pre><ul>
<li>I updated the <code>check-spider-hits.sh</code> script to use the POST syntax, and I'm evaluating the feasability of including the regex search patterns from the spider agent file, as I had been filtering them out due to differences in PCRE and Solr regex syntax and issues with shell handling</li>
</ul>
<h2 id="20191114">2019-11-14</h2>
<ul> <ul>
<li>IWMI sent a few new ORCID identifiers for us to add to our controlled vocabulary</li> <li>IWMI sent a few new ORCID identifiers for us to add to our controlled vocabulary</li>
<li>I will merge them with our existing list and then resolve their names using my <code>resolve-orcids.py</code> script:</li>
<li><p>I will merge them with our existing list and then resolve their names using my <code>resolve-orcids.py</code> script:</p> </ul>
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/iwmi-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2019-11-14-combined-orcids.txt <pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/iwmi-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2019-11-14-combined-orcids.txt
$ ./resolve-orcids.py -i /tmp/2019-11-14-combined-orcids.txt -o /tmp/2019-11-14-combined-names.txt -d $ ./resolve-orcids.py -i /tmp/2019-11-14-combined-orcids.txt -o /tmp/2019-11-14-combined-names.txt -d
# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents) # sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
</code></pre></li> </code></pre><ul>
<li>I created a <a href="https://github.com/ilri/DSpace/pull/437">pull request</a> and merged them into the <code>5_x-prod</code> branch
<li><p>I created a <a href="https://github.com/ilri/DSpace/pull/437">pull request</a> and merged them into the <code>5_x-prod</code> branch</p>
<ul> <ul>
<li>I will deploy them to CGSpace in the next few days</li> <li>I will deploy them to CGSpace in the next few days</li>
</ul></li> </ul>
</li>
<li><p>Greatly improve my <code>check-spider-hits.sh</code> script to handle regular expressions in the spider agents patterns file</p> <li>Greatly improve my <code>check-spider-hits.sh</code> script to handle regular expressions in the spider agents patterns file
<ul> <ul>
<li>This allows me to detect and purge many more hits from the Solr statistics core</li> <li>This allows me to detect and purge many more hits from the Solr statistics core</li>
<li>I&rsquo;ve tested it quite a bit on DSpace Test, but I need to do a little more before I feel comfortable running the new code on CGSpace&rsquo;s Solr cores</li> <li>I've tested it quite a bit on DSpace Test, but I need to do a little more before I feel comfortable running the new code on CGSpace's Solr cores</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-15">2019-11-15</h2> </ul>
<h2 id="20191115">2019-11-15</h2>
<ul> <ul>
<li>Run the new version of <code>check-spider-hits.sh</code> on CGSpace&rsquo;s Solr statistics cores one by one, starting from the oldest just in case something goes wrong</li> <li>Run the new version of <code>check-spider-hits.sh</code> on CGSpace's Solr statistics cores one by one, starting from the oldest just in case something goes wrong</li>
<li>But then I noticed that some (all?) of the hits weren&rsquo;t actually getting purged, all of which were using regular expressions like: <li>But then I noticed that some (all?) of the hits weren't actually getting purged, all of which were using regular expressions like:
<ul> <ul>
<li><code>MetaURI[\+\s]API\/[0-9]\.[0-9]</code></li> <li><code>MetaURI[\+\s]API\/[0-9]\.[0-9]</code></li>
<li><code>FDM(\s|\+)[0-9]</code></li> <li><code>FDM(\s|\+)[0-9]</code></li>
@ -538,17 +470,17 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li><code>^Mozilla\/4\.0\+\(compatible;\)$</code></li> <li><code>^Mozilla\/4\.0\+\(compatible;\)$</code></li>
<li><code>^Mozilla\/4\.0\+\(compatible;\+ICS\)$</code></li> <li><code>^Mozilla\/4\.0\+\(compatible;\+ICS\)$</code></li>
<li><code>^Mozilla\/4\.5\+\[en]\+\(Win98;\+I\)$</code></li> <li><code>^Mozilla\/4\.5\+\[en]\+\(Win98;\+I\)$</code></li>
</ul></li> </ul>
</li>
<li>Upon closer inspection, the plus signs seem to be getting misinterpreted somehow in the delete, but not in the select!</li> <li>Upon closer inspection, the plus signs seem to be getting misinterpreted somehow in the delete, but not in the select!</li>
<li>Plus signs are special in regular expressions, URLs, and Solr&rsquo;s Lucene query parser, so I&rsquo;m actually not sure where the issue is <li>Plus signs are special in regular expressions, URLs, and Solr's Lucene query parser, so I'm actually not sure where the issue is
<ul> <ul>
<li>I tried to do URL encoding of the +, double escaping, etc&hellip; but nothing worked</li> <li>I tried to do URL encoding of the +, double escaping, etc&hellip; but nothing worked</li>
<li>I&rsquo;m going to ignore regular expressions that have pluses for now</li> <li>I'm going to ignore regular expressions that have pluses for now</li>
</ul></li> </ul>
</li>
<li>I think I might also have to ignore patterns that have percent signs, like <code>^\%?default\%?$</code></li> <li>I think I might also have to ignore patterns that have percent signs, like <code>^\%?default\%?$</code></li>
<li>After I added the ignores and did some more testing I finally ran the <code>check-spider-hits.sh</code> on all CGSpace Solr statistics cores and these are the number of hits purged from each core: <li>After I added the ignores and did some more testing I finally ran the <code>check-spider-hits.sh</code> on all CGSpace Solr statistics cores and these are the number of hits purged from each core:
<ul> <ul>
<li>statistics-2010: 113</li> <li>statistics-2010: 113</li>
<li>statistics-2011: 7235</li> <li>statistics-2011: 7235</li>
@ -560,188 +492,178 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li>statistics-2017: 39207</li> <li>statistics-2017: 39207</li>
<li>statistics-2018: 295546</li> <li>statistics-2018: 295546</li>
<li>statistics: 1043373</li> <li>statistics: 1043373</li>
</ul></li>
<li>That&rsquo;s 1.4 million hits in addition to the 2 million I purged earlier this week&hellip;</li>
<li>For posterity, the major contributors to the hits on the statistics core were:
<ul>
<li>Purging 812429 hits from curl\/ in statistics</li>
<li>Purging 48206 hits from facebookexternalhit\/ in statistics</li>
<li>Purging 72004 hits from PHP\/ in statistics</li>
<li>Purging 76072 hits from Yeti\/[0-9] in statistics</li>
</ul></li>
<li><p>Most of the curl hits were from CIAT in mid-2019, where they were using <a href="https://guzzle3.readthedocs.io/http-client/client.html">GuzzleHttp</a> from PHP, which uses something like this for its user agent:</p>
<pre><code>Guzzle/&lt;Guzzle_Version&gt; curl/&lt;curl_version&gt; PHP/&lt;PHP_VERSION&gt;
</code></pre></li>
<li><p>Run system updates on DSpace Test and reboot the server</p></li>
</ul> </ul>
</li>
<h2 id="2019-11-17">2019-11-17</h2> <li>That's 1.4 million hits in addition to the 2 million I purged earlier this week&hellip;</li>
<li>For posterity, the major contributors to the hits on the statistics core were:
<ul> <ul>
<li>Altmetric support responded about our dashboard question, asking if the second &ldquo;department&rdquo; (aka WLE&rsquo;s collection) was added recently and might have not been in the last harvesting yet <li>Purging 812429 hits from curl/ in statistics</li>
<li>Purging 48206 hits from facebookexternalhit/ in statistics</li>
<li>Purging 72004 hits from PHP/ in statistics</li>
<li>Purging 76072 hits from Yeti/[0-9] in statistics</li>
</ul>
</li>
<li>Most of the curl hits were from CIAT in mid-2019, where they were using <a href="https://guzzle3.readthedocs.io/http-client/client.html">GuzzleHttp</a> from PHP, which uses something like this for its user agent:</li>
</ul>
<pre><code>Guzzle/&lt;Guzzle_Version&gt; curl/&lt;curl_version&gt; PHP/&lt;PHP_VERSION&gt;
</code></pre><ul>
<li>Run system updates on DSpace Test and reboot the server</li>
</ul>
<h2 id="20191117">2019-11-17</h2>
<ul>
<li>Altmetric support responded about our dashboard question, asking if the second &ldquo;department&rdquo; (aka WLE's collection) was added recently and might have not been in the last harvesting yet
<ul> <ul>
<li>I told her no, that the department is several years old, and the item was added in 2017</li> <li>I told her no, that the department is several years old, and the item was added in 2017</li>
<li>Then I looked again at the dashboard for each department and I see the item in both departments now&hellip; shit.</li> <li>Then I looked again at the dashboard for each department and I see the item in both departments now&hellip; shit.</li>
<li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_16814&amp;q=Towards%20sustainable%20sanitation%20management">search in the IWMI department shows the item</a></li> <li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_16814&amp;q=Towards%20sustainable%20sanitation%20management">search in the IWMI department shows the item</a></li>
<li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_34494&amp;q=Towards%20sustainable%20sanitation%20management">search in the WLE department shows the item</a></li> <li>A <a href="https://www.altmetric.com/explorer/outputs?department_id%5B%5D=CGSpace%3Agroup%3Acom_10568_34494&amp;q=Towards%20sustainable%20sanitation%20management">search in the WLE department shows the item</a></li>
</ul></li> </ul>
</li>
<li>I finally decided to revert <code>cg.hasMetadata</code> back to <code>cg.identifier.dataurl</code> in my CG Core v2 branch (see <a href="https://github.com/AgriculturalSemantics/cg-core/issues/10">#10</a>)</li> <li>I finally decided to revert <code>cg.hasMetadata</code> back to <code>cg.identifier.dataurl</code> in my CG Core v2 branch (see <a href="https://github.com/AgriculturalSemantics/cg-core/issues/10">#10</a>)</li>
<li>Regarding the <a href="https://hdl.handle.net/10568/97087">WLE item</a> that has a much lower score than its DOI&hellip; <li>Regarding the <a href="https://hdl.handle.net/10568/97087">WLE item</a> that has a much lower score than its DOI&hellip;
<ul> <ul>
<li>I tweeted the item twice last week and the score never got linked</li> <li>I tweeted the item twice last week and the score never got linked</li>
<li>Then I noticed that I had already made a note about the same issue in 2019-04, when I also tweeted it several times&hellip;</li> <li>Then I noticed that I had already made a note about the same issue in 2019-04, when I also tweeted it several times&hellip;</li>
<li>I will ask Altmetric support for help with that</li> <li>I will ask Altmetric support for help with that</li>
</ul></li> </ul>
</li>
<li>Finally deploy <code>5_x-cgcorev2</code> branch on DSpace Test</li> <li>Finally deploy <code>5_x-cgcorev2</code> branch on DSpace Test</li>
</ul> </ul>
<h2 id="20191118">2019-11-18</h2>
<h2 id="2019-11-18">2019-11-18</h2>
<ul> <ul>
<li>I sent a mail to the CGSpace partners in Addis about the CG Core v2 changes on DSpace Test</li> <li>I sent a mail to the CGSpace partners in Addis about the CG Core v2 changes on DSpace Test</li>
<li>Then I filed an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/11">issue on the CG Core GitHub</a> to let the metadata people know about our progress</li> <li>Then I filed an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/11">issue on the CG Core GitHub</a> to let the metadata people know about our progress</li>
<li>It seems like I will do a session about CG Core v2 implementation and limitations in DSpace for the data workshop in December in Nairobi (?)</li> <li>It seems like I will do a session about CG Core v2 implementation and limitations in DSpace for the data workshop in December in Nairobi (?)</li>
</ul> </ul>
<h2 id="20191119">2019-11-19</h2>
<h2 id="2019-11-19">2019-11-19</h2>
<ul> <ul>
<li>Export IITA&rsquo;s community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something <li>Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something
<ul> <ul>
<li>I had previously sent them an export in 2019-04</li> <li>I had previously sent them an export in 2019-04</li>
</ul></li> </ul>
</li>
<li>Atmire merged my <a href="https://github.com/atmire/COUNTER-Robots/pull/28">pull request regarding unnecessary escaping of dashes</a> in regular expressions, as well as <a href="https://github.com/atmire/COUNTER-Robots/issues/27">my suggestion of adding &ldquo;User-Agent&rdquo; to the list of patterns</a></li> <li>Atmire merged my <a href="https://github.com/atmire/COUNTER-Robots/pull/28">pull request regarding unnecessary escaping of dashes</a> in regular expressions, as well as <a href="https://github.com/atmire/COUNTER-Robots/issues/27">my suggestion of adding &ldquo;User-Agent&rdquo; to the list of patterns</a></li>
<li>I made another <a href="https://github.com/atmire/COUNTER-Robots/pull/29">pull request to fix invalid escaping of one of their new patterns</a></li> <li>I made another <a href="https://github.com/atmire/COUNTER-Robots/pull/29">pull request to fix invalid escaping of one of their new patterns</a></li>
<li>I ran my <code>check-spider-hits.sh</code> script again with these new patterns and found a bunch more statistics requests that match, for example: <li>I ran my <code>check-spider-hits.sh</code> script again with these new patterns and found a bunch more statistics requests that match, for example:
<ul> <ul>
<li>Found 39560 hits from ^Buck\/[0-9] in statistics</li> <li>Found 39560 hits from ^Buck/[0-9] in statistics</li>
<li>Found 5471 hits from ^User-Agent in statistics</li> <li>Found 5471 hits from ^User-Agent in statistics</li>
<li>Found 2994 hits from ^Buck\/[0-9] in statistics-2018</li> <li>Found 2994 hits from ^Buck/[0-9] in statistics-2018</li>
<li>Found 14076 hits from ^User-Agent in statistics-2018</li> <li>Found 14076 hits from ^User-Agent in statistics-2018</li>
<li>Found 16310 hits from ^User-Agent in statistics-2017</li> <li>Found 16310 hits from ^User-Agent in statistics-2017</li>
<li>Found 4429 hits from ^User-Agent in statistics-2016</li> <li>Found 4429 hits from ^User-Agent in statistics-2016</li>
</ul></li> </ul>
</li>
<li><p>Buck is one I&rsquo;ve never heard of before, its user agent is:</p> <li>Buck is one I've never heard of before, its user agent is:</li>
</ul>
<pre><code>Buck/2.2; (+https://app.hypefactors.com/media-monitoring/about.html) <pre><code>Buck/2.2; (+https://app.hypefactors.com/media-monitoring/about.html)
</code></pre></li> </code></pre><ul>
<li>All in all that's about 85,000 more hits purged, in addition to the 3.4 million I purged last week</li>
<li><p>All in all that&rsquo;s about 85,000 more hits purged, in addition to the 3.4 million I purged last week</p></li>
</ul> </ul>
<h2 id="20191120">2019-11-20</h2>
<h2 id="2019-11-20">2019-11-20</h2>
<ul> <ul>
<li>Email Usman Muchlish from CIFOR to see what he&rsquo;s doing with their DSpace lately</li> <li>Email Usman Muchlish from CIFOR to see what he's doing with their DSpace lately</li>
</ul> </ul>
<h2 id="20191121">2019-11-21</h2>
<h2 id="2019-11-21">2019-11-21</h2>
<ul> <ul>
<li>Discuss bugs and issues with AReS v2 that are limiting its adoption <li>Discuss bugs and issues with AReS v2 that are limiting its adoption
<ul> <ul>
<li>BUG: If you search for items between year 2012 and 2019, then remove some years from the &ldquo;info product analysis&rdquo;, they are still present in the search results and export</li> <li>BUG: If you search for items between year 2012 and 2019, then remove some years from the &ldquo;info product analysis&rdquo;, they are still present in the search results and export</li>
<li>FEATURE: Ability to add month to date filter?</li> <li>FEATURE: Ability to add month to date filter?</li>
<li>FEATURE: Add &ldquo;review status&rdquo;, &ldquo;series&rdquo;, and &ldquo;usage rights&rdquo; to search filters</li> <li>FEATURE: Add &ldquo;review status&rdquo;, &ldquo;series&rdquo;, and &ldquo;usage rights&rdquo; to search filters</li>
<li>FEATURE: Downloads and views are not included in exports</li> <li>FEATURE: Downloads and views are not included in exports</li>
<li>FEATURE: Add more fields to exports (Abenet will clarify)</li> <li>FEATURE: Add more fields to exports (Abenet will clarify)</li>
</ul></li> </ul>
</li>
<li>As for the larger features to focus on in the future ToRs: <li>As for the larger features to focus on in the future ToRs:
<ul> <ul>
<li>FEATURE: Unique, linkable URL for a set of search results (discussed with Moayad, he has a plan for this)</li> <li>FEATURE: Unique, linkable URL for a set of search results (discussed with Moayad, he has a plan for this)</li>
<li>FEATURE: Reporting that we talked about in Amman in January, 2019.</li> <li>FEATURE: Reporting that we talked about in Amman in January, 2019.</li>
</ul></li> </ul>
</li>
<li>We have a meeting about AReS future developments with Jane, Abenet, Peter, and Enrico tomorrow</li> <li>We have a meeting about AReS future developments with Jane, Abenet, Peter, and Enrico tomorrow</li>
</ul> </ul>
<h2 id="20191122">2019-11-22</h2>
<h2 id="2019-11-22">2019-11-22</h2>
<ul> <ul>
<li>Skype with Jane, Abenet, Peter, and Enrico about AReS v2 future development <li>Skype with Jane, Abenet, Peter, and Enrico about AReS v2 future development
<ul> <ul>
<li>We want to move AReS v2 from dspacetest.cgiar.org/explorer to cgspace.cgiar.org/explorer</li> <li>We want to move AReS v2 from dspacetest.cgiar.org/explorer to cgspace.cgiar.org/explorer</li>
<li>We want to maintain a public demo of the vanilla OpenRXV with a subset of data, for example a non-CG community</li> <li>We want to maintain a public demo of the vanilla OpenRXV with a subset of data, for example a non-CG community</li>
<li>We want to try to move all issues and milestones to GitHub</li> <li>We want to try to move all issues and milestones to GitHub</li>
<li>I need to try to work with ILRI Finance to pre-pay the AReS Linode server (linode11779072) for 2020</li> <li>I need to try to work with ILRI Finance to pre-pay the AReS Linode server (linode11779072) for 2020</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-24">2019-11-24</h2> </ul>
<h2 id="20191124">2019-11-24</h2>
<ul> <ul>
<li>I rebooted DSpace Test (linode19) and it kernel panicked at boot <li>I rebooted DSpace Test (linode19) and it kernel panicked at boot
<ul> <ul>
<li>I looked on the console and saw that it can&rsquo;t mount the root filesystem</li> <li>I looked on the console and saw that it can't mount the root filesystem</li>
<li>I switched the boot configuration to use the OS&rsquo;s kernel via GRUB2 instead of Linode&rsquo;s kernel and then it came up after reboot&hellip;</li> <li>I switched the boot configuration to use the OS's kernel via GRUB2 instead of Linode's kernel and then it came up after reboot&hellip;</li>
<li>I initiated a migration of the server from the Fremont, CA region to Frankfurt, DE</li> <li>I initiated a migration of the server from the Fremont, CA region to Frankfurt, DE
<ul>
<li>The migration is going very slowly, so I assume the network issues from earlier this year are still not fixed</li> <li>The migration is going very slowly, so I assume the network issues from earlier this year are still not fixed</li>
<li>I opened a new ticket (13056701) with Linode support, with reference to my previous ticket (11804943)</li> <li>I opened a new ticket (13056701) with Linode support, with reference to my previous ticket (11804943)</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-25">2019-11-25</h2> </ul>
</li>
</ul>
<h2 id="20191125">2019-11-25</h2>
<ul> <ul>
<li>The migration of DSpace Test from Fremont, CA (USA) to Frankfurt (DE) region completed <li>The migration of DSpace Test from Fremont, CA (USA) to Frankfurt (DE) region completed
<ul> <ul>
<li>The IP address of the server changed so I need to email CGNET to ask them to update the DNS</li> <li>The IP address of the server changed so I need to email CGNET to ask them to update the DNS</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-26">2019-11-26</h2> </ul>
<h2 id="20191126">2019-11-26</h2>
<ul> <ul>
<li>Visit CodeObie to discuss future of OpenRXV and AReS <li>Visit CodeObia to discuss future of OpenRXV and AReS
<ul> <ul>
<li>I started working on categorizing and validating the feedback that Jane collated into a spreadsheet last week</li> <li>I started working on categorizing and validating the feedback that Jane collated into a spreadsheet last week</li>
<li>I added GitHub issues for eight of the items so far, tagging them by &ldquo;bug&rdquo;, &ldquo;search&rdquo;, &ldquo;feature&rdquo;, &ldquo;graphics&rdquo;, &ldquo;low-priority&rdquo;, etc</li> <li>I added GitHub issues for eight of the items so far, tagging them by &ldquo;bug&rdquo;, &ldquo;search&rdquo;, &ldquo;feature&rdquo;, &ldquo;graphics&rdquo;, &ldquo;low-priority&rdquo;, etc</li>
<li>I moved AReS v2 to be available on CGSpace</li> <li>I moved AReS v2 to be available on CGSpace</li>
</ul></li>
</ul> </ul>
</li>
<h2 id="2019-11-27">2019-11-27</h2> </ul>
<h2 id="20191127">2019-11-27</h2>
<ul> <ul>
<li>Minor updates on the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> <li>Minor updates on the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>
<ul> <ul>
<li>Introduce isort for import sorting</li> <li>Introduce isort for import sorting</li>
<li>Introduce black for code formatting according to PEP8</li> <li>Introduce black for code formatting according to PEP8</li>
<li>Fix some minor issues raised by flake8</li> <li>Fix some minor issues raised by flake8</li>
<li>Release <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.1.1">version 1.1.1</a> and deploy to DSpace Test (linode19)</li> <li>Release <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.1.1">version 1.1.1</a> and deploy to DSpace Test (linode19)</li>
<li>I realize that I never deployed version 1.1.0 (with falcon 2.0.0) on CGSpace (linode18) so I did that as well</li> <li>I realize that I never deployed version 1.1.0 (with falcon 2.0.0) on CGSpace (linode18) so I did that as well</li>
</ul></li> </ul>
</li>
<li>File a ticket (242418) with Altmetric about DCTERMS migration to see if there is anything we need to be careful about</li> <li>File a ticket (242418) with Altmetric about DCTERMS migration to see if there is anything we need to be careful about</li>
<li>Make a pull request against cg-core schema to fix inconsistent references to <code>cg.embargoDate</code> (<a href="https://github.com/AgriculturalSemantics/cg-core/pull/13">#13</a>)</li> <li>Make a pull request against cg-core schema to fix inconsistent references to <code>cg.embargoDate</code> (<a href="https://github.com/AgriculturalSemantics/cg-core/pull/13">#13</a>)</li>
<li>Review the AReS feedback again after Peter made some comments <li>Review the AReS feedback again after Peter made some comments
<ul> <ul>
<li>I standardized the GitHub issue labels in both OpenRXV and AReS issue trackers, using labels like &ldquo;P-low&rdquo; for priority</li> <li>I standardized the GitHub issue labels in both OpenRXV and AReS issue trackers, using labels like &ldquo;P-low&rdquo; for priority</li>
<li>I filed another handful of issues in both trackers and added them to the spreadsheet</li> <li>I filed another handful of issues in both trackers and added them to the spreadsheet</li>
</ul></li> </ul>
</li>
<li>I need to ask Marie-Angelique about the <code>cg.peer-reviewed</code> field <li>I need to ask Marie-Angelique about the <code>cg.peer-reviewed</code> field
<ul> <ul>
<li>We currently use <code>dc.description.version</code> with values like &ldquo;Internal Review&rdquo; and &ldquo;Peer Review&rdquo;, and CG Core v2 currently recommends using &ldquo;True&rdquo; if the field is peer reviewed</li> <li>We currently use <code>dc.description.version</code> with values like &ldquo;Internal Review&rdquo; and &ldquo;Peer Review&rdquo;, and CG Core v2 currently recommends using &ldquo;True&rdquo; if the field is peer reviewed</li>
</ul></li>
</ul> </ul>
</li>
<!-- vim: set sw=2 ts=2: --> </ul>
<h2 id="20191128">2019-11-28</h2>
<ul>
<li>File an issue with CG Core v2 project to ask Marie-Angelique about expanding the scope of <code>cg.peer-reviewed</code> to include other types of review, and possibly to change the field name to something more generic like <code>cg.review-status</code> (<a href="https://github.com/AgriculturalSemantics/cg-core/issues/14">#14</a>)</li>
<li>More review of AReS feedback
<ul>
<li>I clarified some of the feedback</li>
<li>I added status of &ldquo;Issue Filed&rdquo;, &ldquo;Duplicate&rdquo; and &ldquo;No Action Required&rdquo; to several items</li>
<li>I filed a handful more GitHub issues in AReS and OpenRXV GitHub trackers</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="404 Page not found"/> <meta name="twitter:title" content="404 Page not found"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,31 +99,27 @@
</p> </p>
</header> </header>
<h2 id="2019-11-04">2019-11-04</h2> <h2 id="20191104">2019-11-04</h2>
<ul> <ul>
<li><p>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics</p> <li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul> <ul>
<li><p>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</p> <li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
</code></pre></li> </code></pre><ul>
</ul></li> <li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let's see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
<li><p>So 4.6 million from XMLUI and another 1.2 million from API requests</p></li> </ul>
<li><p>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</p>
<pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
</article> </article>
@ -145,7 +140,6 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
</article> </article>
@ -164,8 +158,7 @@
</p> </p>
</header> </header>
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c 'id,dc.
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
</article> </article>
@ -183,37 +176,34 @@
</p> </p>
</header> </header>
<h2 id="2019-09-01">2019-09-01</h2> <h2 id="20190901">2019-09-01</h2>
<ul> <ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li> <li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article> </article>
@ -231,22 +221,19 @@
</p> </p>
</header> </header>
<h2 id="2019-08-03">2019-08-03</h2> <h2 id="20190803">2019-08-03</h2>
<ul> <ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li> <li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul> </ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul> <ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li> <li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it <li>Run system updates on CGSpace (linode18) and reboot it
<ul> <ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li> <li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li> <li>After rebooting, all statistics cores were loaded&hellip; wow, that's lucky.</li>
</ul></li> </ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
@ -266,16 +253,15 @@
</p> </p>
</header> </header>
<h2 id="2019-07-01">2019-07-01</h2> <h2 id="20190701">2019-07-01</h2>
<ul> <ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li> <li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: <li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul></li> </ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li> <li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
@ -295,15 +281,12 @@
</p> </p>
</header> </header>
<h2 id="2019-06-02">2019-06-02</h2> <h2 id="20190602">2019-06-02</h2>
<ul> <ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li> <li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li> <li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul> </ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul> <ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li> <li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul> </ul>
@ -324,24 +307,21 @@
</p> </p>
</header> </header>
<h2 id="2019-05-01">2019-05-01</h2> <h2 id="20190501">2019-05-01</h2>
<ul> <ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li> <li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items <li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul> <ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li> <li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li> <li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul></li> </ul>
</li>
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p> <li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648; <pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article> </article>
@ -360,35 +340,30 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2019-04-01">2019-04-01</h2> <h2 id="20190401">2019-04-01</h2>
<ul> <ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc <li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul> <ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li> <li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul></li> </ul>
</li>
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p> <li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul> <ul>
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p> <li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200 4432 200
</code></pre></li> </code></pre><ul>
</ul></li> <li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li> </ul>
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article> </article>
@ -406,20 +381,19 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p> </p>
</header> </header>
<h2 id="2019-03-01">2019-03-01</h2> <h2 id="20190301">2019-03-01</h2>
<ul> <ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li> <li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li> <li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11 <li>Looking at the other half of Udana's WLE records from 2018-11
<ul> <ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li> <li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li> <li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li> <li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li> <li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li> <li>2003<EFBFBD>2013 instead of 20032013</li>
</ul></li> </ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li> <li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -85,31 +84,27 @@
</p> </p>
</header> </header>
<h2 id="2019-11-04">2019-11-04</h2> <h2 id="20191104">2019-11-04</h2>
<ul> <ul>
<li><p>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics</p> <li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul> <ul>
<li><p>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</p> <li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
</code></pre></li> </code></pre><ul>
</ul></li> <li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let's see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
<li><p>So 4.6 million from XMLUI and another 1.2 million from API requests</p></li> </ul>
<li><p>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</p>
<pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
</article> </article>
@ -130,7 +125,6 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
</article> </article>
@ -149,8 +143,7 @@
</p> </p>
</header> </header>
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c 'id,dc.
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
</article> </article>
@ -168,37 +161,34 @@
</p> </p>
</header> </header>
<h2 id="2019-09-01">2019-09-01</h2> <h2 id="20190901">2019-09-01</h2>
<ul> <ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li> <li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article> </article>
@ -216,22 +206,19 @@
</p> </p>
</header> </header>
<h2 id="2019-08-03">2019-08-03</h2> <h2 id="20190803">2019-08-03</h2>
<ul> <ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li> <li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul> </ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul> <ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li> <li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it <li>Run system updates on CGSpace (linode18) and reboot it
<ul> <ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li> <li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li> <li>After rebooting, all statistics cores were loaded&hellip; wow, that's lucky.</li>
</ul></li> </ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
@ -251,16 +238,15 @@
</p> </p>
</header> </header>
<h2 id="2019-07-01">2019-07-01</h2> <h2 id="20190701">2019-07-01</h2>
<ul> <ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li> <li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: <li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul></li> </ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li> <li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
@ -280,15 +266,12 @@
</p> </p>
</header> </header>
<h2 id="2019-06-02">2019-06-02</h2> <h2 id="20190602">2019-06-02</h2>
<ul> <ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li> <li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li> <li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul> </ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul> <ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li> <li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul> </ul>
@ -309,24 +292,21 @@
</p> </p>
</header> </header>
<h2 id="2019-05-01">2019-05-01</h2> <h2 id="20190501">2019-05-01</h2>
<ul> <ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li> <li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items <li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul> <ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li> <li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li> <li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul></li> </ul>
</li>
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p> <li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648; <pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article> </article>
@ -345,35 +325,30 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2019-04-01">2019-04-01</h2> <h2 id="20190401">2019-04-01</h2>
<ul> <ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc <li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul> <ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li> <li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul></li> </ul>
</li>
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p> <li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul> <ul>
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p> <li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200 4432 200
</code></pre></li> </code></pre><ul>
</ul></li> <li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li> </ul>
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article> </article>
@ -391,20 +366,19 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p> </p>
</header> </header>
<h2 id="2019-03-01">2019-03-01</h2> <h2 id="20190301">2019-03-01</h2>
<ul> <ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li> <li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li> <li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11 <li>Looking at the other half of Udana's WLE records from 2018-11
<ul> <ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li> <li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li> <li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li> <li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li> <li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li> <li>2003<EFBFBD>2013 instead of 20032013</li>
</ul></li> </ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li> <li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>

View File

@ -17,31 +17,27 @@
<pubDate>Mon, 04 Nov 2019 12:20:30 +0200</pubDate> <pubDate>Mon, 04 Nov 2019 12:20:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-11/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-11/</guid>
<description>&lt;h2 id=&#34;2019-11-04&#34;&gt;2019-11-04&lt;/h2&gt; <description>&lt;h2 id=&#34;20191104&#34;&gt;2019-11-04&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics&lt;/p&gt; &lt;li&gt;Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:&lt;/p&gt; &lt;li&gt;I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot; &lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot;
1277694 1277694
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;li&gt;So 4.6 million from XMLUI and another 1.2 million from API requests&lt;/li&gt;
&lt;li&gt;Let&#39;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):&lt;/li&gt;
&lt;li&gt;&lt;p&gt;So 4.6 million from XMLUI and another 1.2 million from API requests&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;li&gt;&lt;p&gt;Let&amp;rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot; &lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot; | grep -c -E &amp;quot;/rest/bitstreams&amp;quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &amp;quot;[0-9]{1,2}/Oct/2019&amp;quot; | grep -c -E &amp;quot;/rest/bitstreams&amp;quot;
106781 106781
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -51,7 +47,6 @@
<guid>https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/</guid> <guid>https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/</guid>
<description>&lt;p&gt;Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.&lt;/p&gt; <description>&lt;p&gt;Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.&lt;/p&gt;
&lt;p&gt;With reference to &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2 draft standard&lt;/a&gt; by Marie-Angélique as well as &lt;a href=&#34;http://www.dublincore.org/specifications/dublin-core/dcmi-terms/&#34;&gt;DCMI DCTERMS&lt;/a&gt;.&lt;/p&gt;</description> &lt;p&gt;With reference to &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2 draft standard&lt;/a&gt; by Marie-Angélique as well as &lt;a href=&#34;http://www.dublincore.org/specifications/dublin-core/dcmi-terms/&#34;&gt;DCMI DCTERMS&lt;/a&gt;.&lt;/p&gt;</description>
</item> </item>
@ -61,8 +56,7 @@
<pubDate>Tue, 01 Oct 2019 13:20:51 +0300</pubDate> <pubDate>Tue, 01 Oct 2019 13:20:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-10/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-10/</guid>
<description>2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace <description>2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&#39;s &amp;ldquo;unneccesary Unicode&amp;rdquo; fix: $ csvcut -c &#39;id,dc.</description>
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&amp;rsquo;s &amp;ldquo;unneccesary Unicode&amp;rdquo; fix:</description>
</item> </item>
<item> <item>
@ -71,37 +65,34 @@
<pubDate>Sun, 01 Sep 2019 10:17:51 +0300</pubDate> <pubDate>Sun, 01 Sep 2019 10:17:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-09/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-09/</guid>
<description>&lt;h2 id=&#34;2019-09-01&#34;&gt;2019-09-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20190901&#34;&gt;2019-09-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning&lt;/li&gt; &lt;li&gt;Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning&lt;/li&gt;
&lt;li&gt;Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &amp;quot;01/Sep/2019:0&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 &lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &amp;quot;01/Sep/2019:0&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255 440 17.58.101.255
441 157.55.39.101 441 157.55.39.101
485 207.46.13.43 485 207.46.13.43
728 169.60.128.125 728 169.60.128.125
730 207.46.13.108 730 207.46.13.108
758 157.55.39.9 758 157.55.39.9
808 66.160.140.179 808 66.160.140.179
814 207.46.13.212 814 207.46.13.212
2472 163.172.71.23 2472 163.172.71.23
6092 3.94.211.189 6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &amp;quot;01/Sep/2019:0&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &amp;quot;01/Sep/2019:0&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb 33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124 57 3.83.192.124
57 3.87.77.25 57 3.87.77.25
57 54.82.1.8 57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2 822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72 1223 45.5.184.72
1633 172.104.229.92 1633 172.104.229.92
5112 205.186.128.185 5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396 7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2 9124 45.5.186.2
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -110,22 +101,19 @@
<pubDate>Sat, 03 Aug 2019 12:39:51 +0300</pubDate> <pubDate>Sat, 03 Aug 2019 12:39:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-08/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-08/</guid>
<description>&lt;h2 id=&#34;2019-08-03&#34;&gt;2019-08-03&lt;/h2&gt; <description>&lt;h2 id=&#34;20190803&#34;&gt;2019-08-03&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Look at Bioversity&amp;rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&amp;hellip;&lt;/li&gt; &lt;li&gt;Look at Bioversity&#39;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&amp;hellip;&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20190804&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;h2 id=&#34;2019-08-04&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Deploy ORCID identifier updates requested by Bioversity to CGSpace&lt;/li&gt; &lt;li&gt;Deploy ORCID identifier updates requested by Bioversity to CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it &lt;li&gt;Run system updates on CGSpace (linode18) and reboot it
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Before updating it I checked Solr and verified that all statistics cores were loaded properly&amp;hellip;&lt;/li&gt; &lt;li&gt;Before updating it I checked Solr and verified that all statistics cores were loaded properly&amp;hellip;&lt;/li&gt;
&lt;li&gt;After rebooting, all statistics cores were loaded&amp;hellip; wow, that&amp;rsquo;s lucky.&lt;/li&gt; &lt;li&gt;After rebooting, all statistics cores were loaded&amp;hellip; wow, that&#39;s lucky.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Run system updates on DSpace Test (linode19) and reboot it&lt;/li&gt; &lt;li&gt;Run system updates on DSpace Test (linode19) and reboot it&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -136,16 +124,15 @@
<pubDate>Mon, 01 Jul 2019 12:13:51 +0300</pubDate> <pubDate>Mon, 01 Jul 2019 12:13:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-07/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-07/</guid>
<description>&lt;h2 id=&#34;2019-07-01&#34;&gt;2019-07-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20190701&#34;&gt;2019-07-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Create an &amp;ldquo;AfricaRice books and book chapters&amp;rdquo; collection on CGSpace for AfricaRice&lt;/li&gt; &lt;li&gt;Create an &amp;ldquo;AfricaRice books and book chapters&amp;rdquo; collection on CGSpace for AfricaRice&lt;/li&gt;
&lt;li&gt;Last month Sisay asked why the following &amp;ldquo;most popular&amp;rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: &lt;li&gt;Last month Sisay asked why the following &amp;ldquo;most popular&amp;rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;amp;time_filter_end_date=01%2F12%2F2018&#34;&gt;DSpace Test&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;a href=&#34;https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;amp;time_filter_end_date=01%2F12%2F2018&#34;&gt;DSpace Test&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;amp;time_filter_end_date=01%2F12%2F2018&#34;&gt;CGSpace&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;amp;time_filter_end_date=01%2F12%2F2018&#34;&gt;CGSpace&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community&lt;/li&gt; &lt;li&gt;Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -156,15 +143,12 @@
<pubDate>Sun, 02 Jun 2019 10:57:51 +0300</pubDate> <pubDate>Sun, 02 Jun 2019 10:57:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-06/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-06/</guid>
<description>&lt;h2 id=&#34;2019-06-02&#34;&gt;2019-06-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20190602&#34;&gt;2019-06-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Merge the &lt;a href=&#34;https://github.com/ilri/DSpace/pull/425&#34;&gt;Solr filterCache&lt;/a&gt; and &lt;a href=&#34;https://github.com/ilri/DSpace/pull/426&#34;&gt;XMLUI ISI journal&lt;/a&gt; changes to the &lt;code&gt;5_x-prod&lt;/code&gt; branch and deploy on CGSpace&lt;/li&gt; &lt;li&gt;Merge the &lt;a href=&#34;https://github.com/ilri/DSpace/pull/425&#34;&gt;Solr filterCache&lt;/a&gt; and &lt;a href=&#34;https://github.com/ilri/DSpace/pull/426&#34;&gt;XMLUI ISI journal&lt;/a&gt; changes to the &lt;code&gt;5_x-prod&lt;/code&gt; branch and deploy on CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it&lt;/li&gt; &lt;li&gt;Run system updates on CGSpace (linode18) and reboot it&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20190603&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;h2 id=&#34;2019-06-03&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Skype with Marie-Angélique and Abenet about &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2&lt;/a&gt;&lt;/li&gt; &lt;li&gt;Skype with Marie-Angélique and Abenet about &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
@ -176,24 +160,21 @@
<pubDate>Wed, 01 May 2019 07:37:43 +0300</pubDate> <pubDate>Wed, 01 May 2019 07:37:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-05/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-05/</guid>
<description>&lt;h2 id=&#34;2019-05-01&#34;&gt;2019-05-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20190501&#34;&gt;2019-05-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace&lt;/li&gt; &lt;li&gt;Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace&lt;/li&gt;
&lt;li&gt;A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items &lt;li&gt;A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Apparently if the item is in the &lt;code&gt;workflowitem&lt;/code&gt; table it is submitted to a workflow&lt;/li&gt; &lt;li&gt;Apparently if the item is in the &lt;code&gt;workflowitem&lt;/code&gt; table it is submitted to a workflow&lt;/li&gt;
&lt;li&gt;And if it is in the &lt;code&gt;workspaceitem&lt;/code&gt; table it is in the pre-submitted state&lt;/li&gt; &lt;li&gt;And if it is in the &lt;code&gt;workspaceitem&lt;/code&gt; table it is in the pre-submitted state&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The item seems to be in a pre-submitted state, so I tried to delete it from there:&lt;/p&gt; &lt;li&gt;The item seems to be in a pre-submitted state, so I tried to delete it from there:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# DELETE FROM workspaceitem WHERE item_id=74648; &lt;pre&gt;&lt;code&gt;dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;But after this I tried to delete the item from the XMLUI and it is &lt;em&gt;still&lt;/em&gt; present&amp;hellip;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;But after this I tried to delete the item from the XMLUI and it is &lt;em&gt;still&lt;/em&gt; present&amp;hellip;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -203,35 +184,30 @@ DELETE 1
<pubDate>Mon, 01 Apr 2019 09:00:43 +0300</pubDate> <pubDate>Mon, 01 Apr 2019 09:00:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-04/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-04/</guid>
<description>&lt;h2 id=&#34;2019-04-01&#34;&gt;2019-04-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20190401&#34;&gt;2019-04-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc &lt;li&gt;Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;They asked if we had plans to enable RDF support in CGSpace&lt;/li&gt; &lt;li&gt;They asked if we had plans to enable RDF support in CGSpace&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today&lt;/p&gt; &lt;li&gt;There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!&lt;/p&gt; &lt;li&gt;I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &#39;Spore-192-EN-web.pdf&#39; | grep -E &#39;(18.196.196.108|18.195.78.144|18.195.218.6)&#39; | awk &#39;{print $9}&#39; | sort | uniq -c | sort -n | tail -n 5 &lt;pre&gt;&lt;code&gt;# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &#39;Spore-192-EN-web.pdf&#39; | grep -E &#39;(18.196.196.108|18.195.78.144|18.195.218.6)&#39; | awk &#39;{print $9}&#39; | sort | uniq -c | sort -n | tail -n 5
4432 200 4432 200
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;li&gt;In the last two weeks there have been 47,000 downloads of this &lt;em&gt;same exact PDF&lt;/em&gt; by these three IP addresses&lt;/li&gt;
&lt;li&gt;Apply country and region corrections and deletions on DSpace Test and CGSpace:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the last two weeks there have been 47,000 downloads of this &lt;em&gt;same exact PDF&lt;/em&gt; by these three IP addresses&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;li&gt;&lt;p&gt;Apply country and region corrections and deletions on DSpace Test and CGSpace:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.country -m 228 -t ACTION -d &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.region -m 231 -t action -d $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 231 -f cg.coverage.region -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 231 -f cg.coverage.region -d
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -240,20 +216,19 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Mar 2019 12:16:30 +0100</pubDate> <pubDate>Fri, 01 Mar 2019 12:16:30 +0100</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-03/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-03/</guid>
<description>&lt;h2 id=&#34;2019-03-01&#34;&gt;2019-03-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20190301&#34;&gt;2019-03-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;I checked IITA&amp;rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&amp;rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good&lt;/li&gt; &lt;li&gt;I checked IITA&#39;s 259 Feb 14 records from last month for duplicates using Atmire&#39;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good&lt;/li&gt;
&lt;li&gt;I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&amp;hellip;&lt;/li&gt; &lt;li&gt;I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&amp;hellip;&lt;/li&gt;
&lt;li&gt;Looking at the other half of Udana&amp;rsquo;s WLE records from 2018-11 &lt;li&gt;Looking at the other half of Udana&#39;s WLE records from 2018-11
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)&lt;/li&gt; &lt;li&gt;I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)&lt;/li&gt;
&lt;li&gt;I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items&lt;/li&gt; &lt;li&gt;I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items&lt;/li&gt;
&lt;li&gt;Most worryingly, there are encoding errors in the abstracts for eleven items, for example:&lt;/li&gt; &lt;li&gt;Most worryingly, there are encoding errors in the abstracts for eleven items, for example:&lt;/li&gt;
&lt;li&gt;68.15% <20> 9.45 instead of 68.15% ± 9.45&lt;/li&gt; &lt;li&gt;68.15% <20> 9.45 instead of 68.15% ± 9.45&lt;/li&gt;
&lt;li&gt;2003<EFBFBD>2013 instead of 20032013&lt;/li&gt; &lt;li&gt;2003<EFBFBD>2013 instead of 20032013&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs&lt;/li&gt; &lt;li&gt;I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -264,40 +239,34 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Feb 2019 21:37:30 +0200</pubDate> <pubDate>Fri, 01 Feb 2019 21:37:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-02/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-02/</guid>
<description>&lt;h2 id=&#34;2019-02-01&#34;&gt;2019-02-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20190201&#34;&gt;2019-02-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!&lt;/li&gt; &lt;li&gt;Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!&lt;/li&gt;
&lt;li&gt;The top IPs before, during, and after this latest alert tonight were:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The top IPs before, during, and after this latest alert tonight were:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;01/Feb/2019:(17|18|19|20|21)&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 &lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;01/Feb/2019:(17|18|19|20|21)&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5 245 207.46.13.5
332 54.70.40.11 332 54.70.40.11
385 5.143.231.38 385 5.143.231.38
405 207.46.13.173 405 207.46.13.173
405 207.46.13.75 405 207.46.13.75
1117 66.249.66.219 1117 66.249.66.219
1121 35.237.175.180 1121 35.237.175.180
1546 5.9.6.51 1546 5.9.6.51
2474 45.5.186.2 2474 45.5.186.2
5490 85.25.237.71 5490 85.25.237.71
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;85.25.237.71&lt;/code&gt; is the &amp;ldquo;Linguee Bot&amp;rdquo; that I first saw last month&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;85.25.237.71&lt;/code&gt; is the &amp;ldquo;Linguee Bot&amp;rdquo; that I first saw last month&lt;/p&gt;&lt;/li&gt; &lt;li&gt;The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase&lt;/li&gt;
&lt;li&gt;There were just over 3 million accesses in the nginx logs last month:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;li&gt;&lt;p&gt;There were just over 3 million accesses in the nginx logs last month:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# time zcat --force /var/log/nginx/* | grep -cE &amp;quot;[0-9]{1,2}/Jan/2019&amp;quot; &lt;pre&gt;&lt;code&gt;# time zcat --force /var/log/nginx/* | grep -cE &amp;quot;[0-9]{1,2}/Jan/2019&amp;quot;
3018243 3018243
real 0m19.873s real 0m19.873s
user 0m22.203s user 0m22.203s
sys 0m1.979s sys 0m1.979s
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -306,26 +275,23 @@ sys 0m1.979s
<pubDate>Wed, 02 Jan 2019 09:48:30 +0200</pubDate> <pubDate>Wed, 02 Jan 2019 09:48:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-01/</guid> <guid>https://alanorth.github.io/cgspace-notes/2019-01/</guid>
<description>&lt;h2 id=&#34;2019-01-02&#34;&gt;2019-01-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20190102&#34;&gt;2019-01-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning&lt;/li&gt; &lt;li&gt;Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning&lt;/li&gt;
&lt;li&gt;I don&#39;t see anything interesting in the web server logs around that time though:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I don&amp;rsquo;t see anything interesting in the web server logs around that time though:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;02/Jan/2019:0(1|2|3)&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10 &lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;02/Jan/2019:0(1|2|3)&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4 92 40.77.167.4
99 210.7.29.100 99 210.7.29.100
120 38.126.157.45 120 38.126.157.45
177 35.237.175.180 177 35.237.175.180
177 40.77.167.32 177 40.77.167.32
216 66.249.75.219 216 66.249.75.219
225 18.203.76.93 225 18.203.76.93
261 46.101.86.248 261 46.101.86.248
357 207.46.13.1 357 207.46.13.1
903 54.70.40.11 903 54.70.40.11
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -334,16 +300,13 @@ sys 0m1.979s
<pubDate>Sun, 02 Dec 2018 02:09:30 +0200</pubDate> <pubDate>Sun, 02 Dec 2018 02:09:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-12/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-12/</guid>
<description>&lt;h2 id=&#34;2018-12-01&#34;&gt;2018-12-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20181201&#34;&gt;2018-12-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK&lt;/li&gt; &lt;li&gt;Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK&lt;/li&gt;
&lt;li&gt;I manually installed OpenJDK, then removed Oracle JDK, then re-ran the &lt;a href=&#34;http://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible playbook&lt;/a&gt; to update all configuration files, etc&lt;/li&gt; &lt;li&gt;I manually installed OpenJDK, then removed Oracle JDK, then re-ran the &lt;a href=&#34;http://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible playbook&lt;/a&gt; to update all configuration files, etc&lt;/li&gt;
&lt;li&gt;Then I ran all system updates and restarted the server&lt;/li&gt; &lt;li&gt;Then I ran all system updates and restarted the server&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20181202&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;h2 id=&#34;2018-12-02&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another &lt;a href=&#34;https://usn.ubuntu.com/3831-1/&#34;&gt;Ghostscript vulnerability last week&lt;/a&gt;&lt;/li&gt; &lt;li&gt;I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another &lt;a href=&#34;https://usn.ubuntu.com/3831-1/&#34;&gt;Ghostscript vulnerability last week&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
@ -355,15 +318,12 @@ sys 0m1.979s
<pubDate>Thu, 01 Nov 2018 16:41:30 +0200</pubDate> <pubDate>Thu, 01 Nov 2018 16:41:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-11/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-11/</guid>
<description>&lt;h2 id=&#34;2018-11-01&#34;&gt;2018-11-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20181101&#34;&gt;2018-11-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Finalize AReS Phase I and Phase II ToRs&lt;/li&gt; &lt;li&gt;Finalize AReS Phase I and Phase II ToRs&lt;/li&gt;
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt; &lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20181103&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt; &lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt; &lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
@ -376,11 +336,10 @@ sys 0m1.979s
<pubDate>Mon, 01 Oct 2018 22:31:54 +0300</pubDate> <pubDate>Mon, 01 Oct 2018 22:31:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-10/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-10/</guid>
<description>&lt;h2 id=&#34;2018-10-01&#34;&gt;2018-10-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20181001&#34;&gt;2018-10-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items&lt;/li&gt; &lt;li&gt;Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items&lt;/li&gt;
&lt;li&gt;I created a GitHub issue to track this &lt;a href=&#34;https://github.com/ilri/DSpace/issues/389&#34;&gt;#389&lt;/a&gt;, because I&amp;rsquo;m super busy in Nairobi right now&lt;/li&gt; &lt;li&gt;I created a GitHub issue to track this &lt;a href=&#34;https://github.com/ilri/DSpace/issues/389&#34;&gt;#389&lt;/a&gt;, because I&#39;m super busy in Nairobi right now&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -390,13 +349,12 @@ sys 0m1.979s
<pubDate>Sun, 02 Sep 2018 09:55:54 +0300</pubDate> <pubDate>Sun, 02 Sep 2018 09:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-09/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-09/</guid>
<description>&lt;h2 id=&#34;2018-09-02&#34;&gt;2018-09-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20180902&#34;&gt;2018-09-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;New &lt;a href=&#34;https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5&#34;&gt;PostgreSQL JDBC driver version 42.2.5&lt;/a&gt;&lt;/li&gt; &lt;li&gt;New &lt;a href=&#34;https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5&#34;&gt;PostgreSQL JDBC driver version 42.2.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;ll update the DSpace role in our &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure playbooks&lt;/a&gt; and run the updated playbooks on CGSpace and DSpace Test&lt;/li&gt; &lt;li&gt;I&#39;ll update the DSpace role in our &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure playbooks&lt;/a&gt; and run the updated playbooks on CGSpace and DSpace Test&lt;/li&gt;
&lt;li&gt;Also, I&amp;rsquo;ll re-run the &lt;code&gt;postgresql&lt;/code&gt; tasks because the custom PostgreSQL variables are dynamic according to the system&amp;rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month&lt;/li&gt; &lt;li&gt;Also, I&#39;ll re-run the &lt;code&gt;postgresql&lt;/code&gt; tasks because the custom PostgreSQL variables are dynamic according to the system&#39;s RAM, and we never re-ran them after migrating to larger Linodes last month&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&amp;rsquo;m getting those autowire errors in Tomcat 8.5.30 again:&lt;/li&gt; &lt;li&gt;I&#39;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&#39;m getting those autowire errors in Tomcat 8.5.30 again:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -406,27 +364,20 @@ sys 0m1.979s
<pubDate>Wed, 01 Aug 2018 11:52:54 +0300</pubDate> <pubDate>Wed, 01 Aug 2018 11:52:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-08/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-08/</guid>
<description>&lt;h2 id=&#34;2018-08-01&#34;&gt;2018-08-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20180801&#34;&gt;2018-08-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;DSpace Test had crashed at some point yesterday morning and I see the following in &lt;code&gt;dmesg&lt;/code&gt;:&lt;/p&gt; &lt;li&gt;DSpace Test had crashed at some point yesterday morning and I see the following in &lt;code&gt;dmesg&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child &lt;pre&gt;&lt;code&gt;[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight&lt;/p&gt;&lt;/li&gt; &lt;li&gt;From the DSpace log I see that eventually Solr stopped responding, so I guess the &lt;code&gt;java&lt;/code&gt; process that was OOM killed above was Tomcat&#39;s&lt;/li&gt;
&lt;li&gt;I&#39;m not sure why Tomcat didn&#39;t crash with an OutOfMemoryError&amp;hellip;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;From the DSpace log I see that eventually Solr stopped responding, so I guess the &lt;code&gt;java&lt;/code&gt; process that was OOM killed above was Tomcat&amp;rsquo;s&lt;/p&gt;&lt;/li&gt; &lt;li&gt;Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core&lt;/li&gt;
&lt;li&gt;The server only has 8GB of RAM so we&#39;ll eventually need to upgrade to a larger one because we&#39;ll start starving the OS, PostgreSQL, and command line batch processes&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I&amp;rsquo;m not sure why Tomcat didn&amp;rsquo;t crash with an OutOfMemoryError&amp;hellip;&lt;/p&gt;&lt;/li&gt; &lt;li&gt;I ran all system updates on DSpace Test and rebooted it&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The server only has 8GB of RAM so we&amp;rsquo;ll eventually need to upgrade to a larger one because we&amp;rsquo;ll start starving the OS, PostgreSQL, and command line batch processes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I ran all system updates on DSpace Test and rebooted it&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -436,19 +387,16 @@ sys 0m1.979s
<pubDate>Sun, 01 Jul 2018 12:56:54 +0300</pubDate> <pubDate>Sun, 01 Jul 2018 12:56:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-07/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-07/</guid>
<description>&lt;h2 id=&#34;2018-07-01&#34;&gt;2018-07-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20180701&#34;&gt;2018-07-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:&lt;/p&gt; &lt;li&gt;I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace &lt;pre&gt;&lt;code&gt;$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;During the &lt;code&gt;mvn package&lt;/code&gt; stage on the 5.8 branch I kept getting issues with java running out of memory:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;During the &lt;code&gt;mvn package&lt;/code&gt; stage on the 5.8 branch I kept getting issues with java running out of memory:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;There is insufficient memory for the Java Runtime Environment to continue. &lt;pre&gt;&lt;code&gt;There is insufficient memory for the Java Runtime Environment to continue.
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -457,32 +405,27 @@ sys 0m1.979s
<pubDate>Mon, 04 Jun 2018 19:49:54 -0700</pubDate> <pubDate>Mon, 04 Jun 2018 19:49:54 -0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-06/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-06/</guid>
<description>&lt;h2 id=&#34;2018-06-04&#34;&gt;2018-06-04&lt;/h2&gt; <description>&lt;h2 id=&#34;20180604&#34;&gt;2018-06-04&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Test the &lt;a href=&#34;https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560&#34;&gt;DSpace 5.8 module upgrades from Atmire&lt;/a&gt; (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/378&#34;&gt;#378&lt;/a&gt;) &lt;li&gt;Test the &lt;a href=&#34;https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560&#34;&gt;DSpace 5.8 module upgrades from Atmire&lt;/a&gt; (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/378&#34;&gt;#378&lt;/a&gt;)
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;There seems to be a problem with the CUA and L&amp;amp;R versions in &lt;code&gt;pom.xml&lt;/code&gt; because they are using SNAPSHOT and it doesn&amp;rsquo;t build&lt;/li&gt; &lt;li&gt;There seems to be a problem with the CUA and L&amp;amp;R versions in &lt;code&gt;pom.xml&lt;/code&gt; because they are using SNAPSHOT and it doesn&#39;t build&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;I added the new CCAFS Phase II Project Tag &lt;code&gt;PII-FP1_PACCA2&lt;/code&gt; and merged it into the &lt;code&gt;5_x-prod&lt;/code&gt; branch (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/379&#34;&gt;#379&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;I added the new CCAFS Phase II Project Tag &lt;code&gt;PII-FP1_PACCA2&lt;/code&gt; and merged it into the &lt;code&gt;5_x-prod&lt;/code&gt; branch (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/379&#34;&gt;#379&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;I proofed and tested the ILRI author corrections that Peter sent back to me this week:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I proofed and tested the ILRI author corrections that Peter sent back to me this week:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.contributor.author -t correct -m 3 -n &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.contributor.author -t correct -m 3 -n
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in &lt;a href=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/&#34;&gt;March, 2018&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in &lt;a href=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/&#34;&gt;March, 2018&lt;/a&gt;&lt;/p&gt;&lt;/li&gt; &lt;li&gt;Time to index ~70,000 items on CGSpace:&lt;/li&gt;
&lt;/ul&gt;
&lt;li&gt;&lt;p&gt;Time to index ~70,000 items on CGSpace:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b &lt;pre&gt;&lt;code&gt;$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -491,15 +434,14 @@ sys 2m7.289s
<pubDate>Tue, 01 May 2018 16:43:54 +0300</pubDate> <pubDate>Tue, 01 May 2018 16:43:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-05/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-05/</guid>
<description>&lt;h2 id=&#34;2018-05-01&#34;&gt;2018-05-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20180501&#34;&gt;2018-05-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: &lt;li&gt;I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&#34;&gt;http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&lt;/a&gt;&lt;/li&gt; &lt;li&gt;http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E&#34;&gt;http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E&lt;/a&gt;&lt;/li&gt; &lt;li&gt;http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Then I reduced the JVM heap size from 6144 back to 5120m&lt;/li&gt; &lt;li&gt;Then I reduced the JVM heap size from 6144 back to 5120m&lt;/li&gt;
&lt;li&gt;Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure scripts&lt;/a&gt; to support hosts choosing which distribution they want to use&lt;/li&gt; &lt;li&gt;Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure scripts&lt;/a&gt; to support hosts choosing which distribution they want to use&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
@ -511,10 +453,9 @@ sys 2m7.289s
<pubDate>Sun, 01 Apr 2018 16:13:54 +0200</pubDate> <pubDate>Sun, 01 Apr 2018 16:13:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-04/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-04/</guid>
<description>&lt;h2 id=&#34;2018-04-01&#34;&gt;2018-04-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20180401&#34;&gt;2018-04-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;I tried to test something on DSpace Test but noticed that it&amp;rsquo;s down since god knows when&lt;/li&gt; &lt;li&gt;I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when&lt;/li&gt;
&lt;li&gt;Catalina logs at least show some memory errors yesterday:&lt;/li&gt; &lt;li&gt;Catalina logs at least show some memory errors yesterday:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -525,8 +466,7 @@ sys 2m7.289s
<pubDate>Fri, 02 Mar 2018 16:07:54 +0200</pubDate> <pubDate>Fri, 02 Mar 2018 16:07:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-03/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-03/</guid>
<description>&lt;h2 id=&#34;2018-03-02&#34;&gt;2018-03-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20180302&#34;&gt;2018-03-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Export a CSV of the IITA community metadata for Martin Mueller&lt;/li&gt; &lt;li&gt;Export a CSV of the IITA community metadata for Martin Mueller&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
@ -538,13 +478,12 @@ sys 2m7.289s
<pubDate>Thu, 01 Feb 2018 16:28:54 +0200</pubDate> <pubDate>Thu, 01 Feb 2018 16:28:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-02/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-02/</guid>
<description>&lt;h2 id=&#34;2018-02-01&#34;&gt;2018-02-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20180201&#34;&gt;2018-02-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Peter gave feedback on the &lt;code&gt;dc.rights&lt;/code&gt; proof of concept that I had sent him last week&lt;/li&gt; &lt;li&gt;Peter gave feedback on the &lt;code&gt;dc.rights&lt;/code&gt; proof of concept that I had sent him last week&lt;/li&gt;
&lt;li&gt;We don&amp;rsquo;t need to distinguish between internal and external works, so that makes it just a simple list&lt;/li&gt; &lt;li&gt;We don&#39;t need to distinguish between internal and external works, so that makes it just a simple list&lt;/li&gt;
&lt;li&gt;Yesterday I figured out how to monitor DSpace sessions using JMX&lt;/li&gt; &lt;li&gt;Yesterday I figured out how to monitor DSpace sessions using JMX&lt;/li&gt;
&lt;li&gt;I copied the logic in the &lt;code&gt;jmx_tomcat_dbpools&lt;/code&gt; provided by Ubuntu&amp;rsquo;s &lt;code&gt;munin-plugins-java&lt;/code&gt; package and used the stuff I discovered about JMX &lt;a href=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/&#34;&gt;in 2018-01&lt;/a&gt;&lt;/li&gt; &lt;li&gt;I copied the logic in the &lt;code&gt;jmx_tomcat_dbpools&lt;/code&gt; provided by Ubuntu&#39;s &lt;code&gt;munin-plugins-java&lt;/code&gt; package and used the stuff I discovered about JMX &lt;a href=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/&#34;&gt;in 2018-01&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -554,33 +493,26 @@ sys 2m7.289s
<pubDate>Tue, 02 Jan 2018 08:35:54 -0800</pubDate> <pubDate>Tue, 02 Jan 2018 08:35:54 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-01/</guid> <guid>https://alanorth.github.io/cgspace-notes/2018-01/</guid>
<description>&lt;h2 id=&#34;2018-01-02&#34;&gt;2018-01-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20180102&#34;&gt;2018-01-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time&lt;/li&gt; &lt;li&gt;Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time&lt;/li&gt;
&lt;li&gt;I didn&amp;rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&amp;rsquo;t show anything out of the ordinary&lt;/li&gt; &lt;li&gt;I didn&#39;t get any load alerts from Linode and the REST and XMLUI logs don&#39;t show anything out of the ordinary&lt;/li&gt;
&lt;li&gt;The nginx logs show HTTP 200s until &lt;code&gt;02/Jan/2018:11:27:17 +0000&lt;/code&gt; when Uptime Robot got an HTTP 500&lt;/li&gt; &lt;li&gt;The nginx logs show HTTP 200s until &lt;code&gt;02/Jan/2018:11:27:17 +0000&lt;/code&gt; when Uptime Robot got an HTTP 500&lt;/li&gt;
&lt;li&gt;In dspace.log around that time I see many errors like &amp;ldquo;Client closed the connection before file download was complete&amp;rdquo;&lt;/li&gt; &lt;li&gt;In dspace.log around that time I see many errors like &amp;ldquo;Client closed the connection before file download was complete&amp;rdquo;&lt;/li&gt;
&lt;li&gt;And just before that I see this:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And just before that I see this:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. &lt;pre&gt;&lt;code&gt;Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Ah hah! So the pool was actually empty!&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ah hah! So the pool was actually empty!&lt;/p&gt;&lt;/li&gt; &lt;li&gt;I need to increase that, let&#39;s try to bump it up from 50 to 75&lt;/li&gt;
&lt;li&gt;After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&#39;t know what the hell Uptime Robot saw&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I need to increase that, let&amp;rsquo;s try to bump it up from 50 to 75&lt;/p&gt;&lt;/li&gt; &lt;li&gt;I notice this error quite a few times in dspace.log:&lt;/li&gt;
&lt;/ul&gt;
&lt;li&gt;&lt;p&gt;After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&amp;rsquo;t know what the hell Uptime Robot saw&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I notice this error quite a few times in dspace.log:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets &lt;pre&gt;&lt;code&gt;2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse &#39;dateIssued_keyword:[1976+TO+1979]&#39;: Encountered &amp;quot; &amp;quot;]&amp;quot; &amp;quot;] &amp;quot;&amp;quot; at line 1, column 32. org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse &#39;dateIssued_keyword:[1976+TO+1979]&#39;: Encountered &amp;quot; &amp;quot;]&amp;quot; &amp;quot;] &amp;quot;&amp;quot; at line 1, column 32.
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;And there are many of these errors every day for the past month:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And there are many of these errors every day for the past month:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c &amp;quot;Error while searching for sidebar facets&amp;quot; dspace.log.* &lt;pre&gt;&lt;code&gt;$ grep -c &amp;quot;Error while searching for sidebar facets&amp;quot; dspace.log.*
dspace.log.2017-11-21:4 dspace.log.2017-11-21:4
dspace.log.2017-11-22:1 dspace.log.2017-11-22:1
@ -625,9 +557,8 @@ dspace.log.2017-12-30:89
dspace.log.2017-12-31:53 dspace.log.2017-12-31:53
dspace.log.2018-01-01:45 dspace.log.2018-01-01:45
dspace.log.2018-01-02:34 dspace.log.2018-01-02:34
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&#39;s Encrypt if it&#39;s just a handful of domains&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&amp;rsquo;s Encrypt if it&amp;rsquo;s just a handful of domains&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -637,8 +568,7 @@ dspace.log.2018-01-02:34
<pubDate>Fri, 01 Dec 2017 13:53:54 +0300</pubDate> <pubDate>Fri, 01 Dec 2017 13:53:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-12/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-12/</guid>
<description>&lt;h2 id=&#34;2017-12-01&#34;&gt;2017-12-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20171201&#34;&gt;2017-12-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down&lt;/li&gt; &lt;li&gt;Uptime Robot noticed that CGSpace went down&lt;/li&gt;
&lt;li&gt;The logs say &amp;ldquo;Timeout waiting for idle object&amp;rdquo;&lt;/li&gt; &lt;li&gt;The logs say &amp;ldquo;Timeout waiting for idle object&amp;rdquo;&lt;/li&gt;
@ -653,27 +583,22 @@ dspace.log.2018-01-02:34
<pubDate>Thu, 02 Nov 2017 09:37:54 +0200</pubDate> <pubDate>Thu, 02 Nov 2017 09:37:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-11/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-11/</guid>
<description>&lt;h2 id=&#34;2017-11-01&#34;&gt;2017-11-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20171101&#34;&gt;2017-11-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;The CORE developers responded to say they are looking into their bot not respecting our robots.txt&lt;/li&gt; &lt;li&gt;The CORE developers responded to say they are looking into their bot not respecting our robots.txt&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20171102&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-11-02&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;Today there have been no hits by CORE and no alerts from Linode (coincidence?)&lt;/p&gt; &lt;li&gt;Today there have been no hits by CORE and no alerts from Linode (coincidence?)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# grep -c &amp;quot;CORE&amp;quot; /var/log/nginx/access.log &lt;pre&gt;&lt;code&gt;# grep -c &amp;quot;CORE&amp;quot; /var/log/nginx/access.log
0 0
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Generate list of authors on CGSpace for Peter to go through and correct:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate list of authors on CGSpace for Peter to go through and correct:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = &#39;contributor&#39; and qualifier = &#39;author&#39;) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; &lt;pre&gt;&lt;code&gt;dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = &#39;contributor&#39; and qualifier = &#39;author&#39;) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701 COPY 54701
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -682,17 +607,14 @@ COPY 54701
<pubDate>Sun, 01 Oct 2017 08:07:54 +0300</pubDate> <pubDate>Sun, 01 Oct 2017 08:07:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-10/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-10/</guid>
<description>&lt;h2 id=&#34;2017-10-01&#34;&gt;2017-10-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20171001&#34;&gt;2017-10-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;Peter emailed to point out that many items in the &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/2703&#34;&gt;ILRI archive collection&lt;/a&gt; have multiple handles:&lt;/p&gt; &lt;li&gt;Peter emailed to point out that many items in the &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/2703&#34;&gt;ILRI archive collection&lt;/a&gt; have multiple handles:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 &lt;pre&gt;&lt;code&gt;http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;There appears to be a pattern but I&#39;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There appears to be a pattern but I&amp;rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine&lt;/p&gt;&lt;/li&gt; &lt;li&gt;Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -85,40 +84,34 @@
</p> </p>
</header> </header>
<h2 id="2019-02-01">2019-02-01</h2> <h2 id="20190201">2019-02-01</h2>
<ul> <ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li> <li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
<li><p>The top IPs before, during, and after this latest alert tonight were:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5 245 207.46.13.5
332 54.70.40.11 332 54.70.40.11
385 5.143.231.38 385 5.143.231.38
405 207.46.13.173 405 207.46.13.173
405 207.46.13.75 405 207.46.13.75
1117 66.249.66.219 1117 66.249.66.219
1121 35.237.175.180 1121 35.237.175.180
1546 5.9.6.51 1546 5.9.6.51
2474 45.5.186.2 2474 45.5.186.2
5490 85.25.237.71 5490 85.25.237.71
</code></pre></li> </code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li> <li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li> </ul>
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot; <pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243 3018243
real 0m19.873s real 0m19.873s
user 0m22.203s user 0m22.203s
sys 0m1.979s sys 0m1.979s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article> </article>
@ -136,26 +129,23 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2019-01-02">2019-01-02</h2> <h2 id="20190102">2019-01-02</h2>
<ul> <ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li> <li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article> </article>
@ -173,16 +163,13 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-12-01">2018-12-01</h2> <h2 id="20181201">2018-12-01</h2>
<ul> <ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li> <li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li> <li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li> <li>Then I ran all system updates and restarted the server</li>
</ul> </ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul> <ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li> <li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul> </ul>
@ -203,15 +190,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-11-01">2018-11-01</h2> <h2 id="20181101">2018-11-01</h2>
<ul> <ul>
<li>Finalize AReS Phase I and Phase II ToRs</li> <li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul> </ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul> <ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
@ -233,11 +217,10 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-10-01">2018-10-01</h2> <h2 id="20181001">2018-10-01</h2>
<ul> <ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
</article> </article>
@ -256,13 +239,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-09-02">2018-09-02</h2> <h2 id="20180902">2018-09-02</h2>
<ul> <ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li> <li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>Also, I'll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> <li>I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
</article> </article>
@ -281,27 +263,20 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-08-01">2018-08-01</h2> <h2 id="20180801">2018-08-01</h2>
<ul> <ul>
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p> <li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child <pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li> <li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li> <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
</article> </article>
@ -320,19 +295,16 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-07-01">2018-07-01</h2> <h2 id="20180701">2018-07-01</h2>
<ul> <ul>
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p> <li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre></li>
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre></li>
</ul> </ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
</article> </article>
@ -350,32 +322,27 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-06-04">2018-06-04</h2> <h2 id="20180604">2018-06-04</h2>
<ul> <ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>) <li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul> <ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li> <li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn't build</li>
</ul></li> </ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li> <li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre></li> </code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li> <li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<li><p>Time to index ~70,000 items on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
</article> </article>
@ -393,15 +360,14 @@ sys 2m7.289s
</p> </p>
</header> </header>
<h2 id="2018-05-01">2018-05-01</h2> <h2 id="20180501">2018-05-01</h2>
<ul> <ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: <li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul> <ul>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul></li> </ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul> </ul>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -85,10 +84,9 @@
</p> </p>
</header> </header>
<h2 id="2018-04-01">2018-04-01</h2> <h2 id="20180401">2018-04-01</h2>
<ul> <ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li> <li>Catalina logs at least show some memory errors yesterday:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a>
@ -108,8 +106,7 @@
</p> </p>
</header> </header>
<h2 id="2018-03-02">2018-03-02</h2> <h2 id="20180302">2018-03-02</h2>
<ul> <ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li> <li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul> </ul>
@ -130,13 +127,12 @@
</p> </p>
</header> </header>
<h2 id="2018-02-01">2018-02-01</h2> <h2 id="20180201">2018-02-01</h2>
<ul> <ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> <li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu's <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a>
</article> </article>
@ -155,33 +151,26 @@
</p> </p>
</header> </header>
<h2 id="2018-01-02">2018-01-02</h2> <h2 id="20180102">2018-01-02</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> <li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> <li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> <li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> <li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
<li><p>And just before that I see this:</p> </ul>
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. <pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre></li> </code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li><p>Ah hah! So the pool was actually empty!</p></li> <li>I need to increase that, let's try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw</li>
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li> <li>I notice this error quite a few times in dspace.log:</li>
</ul>
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
<li><p>I notice this error quite a few times in dspace.log:</p>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets <pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre></li> </code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
<li><p>And there are many of these errors every day for the past month:</p> </ul>
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* <pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
dspace.log.2017-11-21:4 dspace.log.2017-11-21:4
dspace.log.2017-11-22:1 dspace.log.2017-11-22:1
@ -226,9 +215,8 @@ dspace.log.2017-12-30:89
dspace.log.2017-12-31:53 dspace.log.2017-12-31:53
dspace.log.2018-01-01:45 dspace.log.2018-01-01:45
dspace.log.2018-01-02:34 dspace.log.2018-01-02:34
</code></pre></li> </code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
</article> </article>
@ -247,8 +235,7 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-12-01">2017-12-01</h2> <h2 id="20171201">2017-12-01</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down</li> <li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -272,27 +259,22 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-11-01">2017-11-01</h2> <h2 id="20171101">2017-11-01</h2>
<ul> <ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> <li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul> </ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul> <ul>
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p> <li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log <pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
0 0
</code></pre></li> </code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701 COPY 54701
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
</article> </article>
@ -310,17 +292,14 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-10-01">2017-10-01</h2> <h2 id="20171001">2017-10-01</h2>
<ul> <ul>
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p> <li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 <pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre></li> </code></pre><ul>
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,40 +99,34 @@
</p> </p>
</header> </header>
<h2 id="2019-02-01">2019-02-01</h2> <h2 id="20190201">2019-02-01</h2>
<ul> <ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li> <li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
<li><p>The top IPs before, during, and after this latest alert tonight were:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5 245 207.46.13.5
332 54.70.40.11 332 54.70.40.11
385 5.143.231.38 385 5.143.231.38
405 207.46.13.173 405 207.46.13.173
405 207.46.13.75 405 207.46.13.75
1117 66.249.66.219 1117 66.249.66.219
1121 35.237.175.180 1121 35.237.175.180
1546 5.9.6.51 1546 5.9.6.51
2474 45.5.186.2 2474 45.5.186.2
5490 85.25.237.71 5490 85.25.237.71
</code></pre></li> </code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li> <li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li> </ul>
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot; <pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243 3018243
real 0m19.873s real 0m19.873s
user 0m22.203s user 0m22.203s
sys 0m1.979s sys 0m1.979s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article> </article>
@ -151,26 +144,23 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2019-01-02">2019-01-02</h2> <h2 id="20190102">2019-01-02</h2>
<ul> <ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li> <li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article> </article>
@ -188,16 +178,13 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-12-01">2018-12-01</h2> <h2 id="20181201">2018-12-01</h2>
<ul> <ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li> <li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li> <li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li> <li>Then I ran all system updates and restarted the server</li>
</ul> </ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul> <ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li> <li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul> </ul>
@ -218,15 +205,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-11-01">2018-11-01</h2> <h2 id="20181101">2018-11-01</h2>
<ul> <ul>
<li>Finalize AReS Phase I and Phase II ToRs</li> <li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul> </ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul> <ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
@ -248,11 +232,10 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-10-01">2018-10-01</h2> <h2 id="20181001">2018-10-01</h2>
<ul> <ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
</article> </article>
@ -271,13 +254,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-09-02">2018-09-02</h2> <h2 id="20180902">2018-09-02</h2>
<ul> <ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li> <li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>Also, I'll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> <li>I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
</article> </article>
@ -296,27 +278,20 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-08-01">2018-08-01</h2> <h2 id="20180801">2018-08-01</h2>
<ul> <ul>
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p> <li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child <pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li> <li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li> <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
</article> </article>
@ -335,19 +310,16 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-07-01">2018-07-01</h2> <h2 id="20180701">2018-07-01</h2>
<ul> <ul>
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p> <li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre></li>
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre></li>
</ul> </ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
</article> </article>
@ -365,32 +337,27 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-06-04">2018-06-04</h2> <h2 id="20180604">2018-06-04</h2>
<ul> <ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>) <li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul> <ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li> <li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn't build</li>
</ul></li> </ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li> <li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre></li> </code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li> <li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<li><p>Time to index ~70,000 items on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
</article> </article>
@ -408,15 +375,14 @@ sys 2m7.289s
</p> </p>
</header> </header>
<h2 id="2018-05-01">2018-05-01</h2> <h2 id="20180501">2018-05-01</h2>
<ul> <ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: <li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul> <ul>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul></li> </ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul> </ul>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,10 +99,9 @@
</p> </p>
</header> </header>
<h2 id="2018-04-01">2018-04-01</h2> <h2 id="20180401">2018-04-01</h2>
<ul> <ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li> <li>Catalina logs at least show some memory errors yesterday:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a>
@ -123,8 +121,7 @@
</p> </p>
</header> </header>
<h2 id="2018-03-02">2018-03-02</h2> <h2 id="20180302">2018-03-02</h2>
<ul> <ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li> <li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul> </ul>
@ -145,13 +142,12 @@
</p> </p>
</header> </header>
<h2 id="2018-02-01">2018-02-01</h2> <h2 id="20180201">2018-02-01</h2>
<ul> <ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> <li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu's <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a>
</article> </article>
@ -170,33 +166,26 @@
</p> </p>
</header> </header>
<h2 id="2018-01-02">2018-01-02</h2> <h2 id="20180102">2018-01-02</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> <li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> <li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> <li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> <li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
<li><p>And just before that I see this:</p> </ul>
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. <pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre></li> </code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li><p>Ah hah! So the pool was actually empty!</p></li> <li>I need to increase that, let's try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw</li>
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li> <li>I notice this error quite a few times in dspace.log:</li>
</ul>
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
<li><p>I notice this error quite a few times in dspace.log:</p>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets <pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre></li> </code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
<li><p>And there are many of these errors every day for the past month:</p> </ul>
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* <pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
dspace.log.2017-11-21:4 dspace.log.2017-11-21:4
dspace.log.2017-11-22:1 dspace.log.2017-11-22:1
@ -241,9 +230,8 @@ dspace.log.2017-12-30:89
dspace.log.2017-12-31:53 dspace.log.2017-12-31:53
dspace.log.2018-01-01:45 dspace.log.2018-01-01:45
dspace.log.2018-01-02:34 dspace.log.2018-01-02:34
</code></pre></li> </code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
</article> </article>
@ -262,8 +250,7 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-12-01">2017-12-01</h2> <h2 id="20171201">2017-12-01</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down</li> <li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -287,27 +274,22 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-11-01">2017-11-01</h2> <h2 id="20171101">2017-11-01</h2>
<ul> <ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> <li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul> </ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul> <ul>
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p> <li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log <pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
0 0
</code></pre></li> </code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701 COPY 54701
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
</article> </article>
@ -325,17 +307,14 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-10-01">2017-10-01</h2> <h2 id="20171001">2017-10-01</h2>
<ul> <ul>
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p> <li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 <pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre></li> </code></pre><ul>
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
</article> </article>
@ -374,16 +353,13 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-09-06">2017-09-06</h2> <h2 id="20170906">2017-09-06</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul> </ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul> <ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a>
</article> </article>
@ -402,22 +378,21 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-08-01">2017-08-01</h2> <h2 id="20170801">2017-08-01</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> <li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> <li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li> <li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like: <li>But many of the bots are browsing dynamic URLs like:
<ul> <ul>
<li>/handle/10568/3353/discover</li> <li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li> <li>/handle/10568/16510/browse</li>
</ul></li> </ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2017-07-01">2017-07-01</h2> <h2 id="20170701">2017-07-01</h2>
<ul> <ul>
<li>Run system updates and reboot DSpace Test</li> <li>Run system updates and reboot DSpace Test</li>
</ul> </ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul> <ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> <li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a>
</article> </article>
@ -130,7 +126,7 @@
</p> </p>
</header> </header>
2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we'll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.
<a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a>
</article> </article>
@ -148,7 +144,7 @@
</p> </p>
</header> </header>
2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.
<a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a>
</article> </article>
@ -166,23 +162,18 @@
</p> </p>
</header> </header>
<h2 id="2017-04-02">2017-04-02</h2> <h2 id="20170402">2017-04-02</h2>
<ul> <ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> <li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> <li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p>
<ul> <ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li> <li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
</article> </article>
@ -200,14 +191,11 @@
</p> </p>
</header> </header>
<h2 id="2017-03-01">2017-03-01</h2> <h2 id="20170301">2017-03-01</h2>
<ul> <ul>
<li>Run the 279 CIAT author corrections on CGSpace</li> <li>Run the 279 CIAT author corrections on CGSpace</li>
</ul> </ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul> <ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> <li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> <li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -217,13 +205,11 @@
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p> </ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
</article> </article>
@ -241,25 +227,22 @@
</p> </p>
</header> </header>
<h2 id="2017-02-07">2017-02-07</h2> <h2 id="20170207">2017-02-07</h2>
<ul> <ul>
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p> <li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278'; <pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li> <li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article> </article>
@ -278,12 +261,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2017-01-02">2017-01-02</h2> <h2 id="20170102">2017-01-02</h2>
<ul> <ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a>
</article> </article>
@ -302,25 +284,20 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-12-02">2016-12-02</h2> <h2 id="20161202">2016-12-02</h2>
<ul> <ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li> <li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p> </ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) <pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre></li> </code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li> <li>I've raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
</article> </article>
@ -339,13 +316,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-11-01">2016-11-01</h2> <h2 id="20161101">2016-11-01</h2>
<ul> <ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> <li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a>
</article> </article>
@ -363,22 +338,19 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-10-03">2016-10-03</h2> <h2 id="20161003">2016-10-03</h2>
<ul> <ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> <li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected: <li>Need to test the following scenarios to see how author order is affected:
<ul> <ul>
<li>ORCIDs only</li> <li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li> <li>ORCIDs plus normal authors</li>
</ul></li>
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre></li>
</ul> </ul>
</li>
<li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2016-09-01">2016-09-01</h2> <h2 id="20160901">2016-09-01</h2>
<ul> <ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre></li>
</ul> </ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
</article> </article>
@ -129,22 +125,19 @@
</p> </p>
</header> </header>
<h2 id="2016-08-01">2016-08-01</h2> <h2 id="20160801">2016-08-01</h2>
<ul> <ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> <li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> <li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> <li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li> <li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> <li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
<li><p>Start working on DSpace 5.15.5 port:</p> </ul>
<pre><code>$ git checkout -b 55new 5_x-prod <pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
</article> </article>
@ -162,22 +155,19 @@ $ git rebase -i dspace-5.5
</p> </p>
</header> </header>
<h2 id="2016-07-01">2016-07-01</h2> <h2 id="20160701">2016-07-01</h2>
<ul> <ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
<li><p>In this case the select query was showing 95 results before the update</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
</article> </article>
@ -196,11 +186,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-06-01">2016-06-01</h2> <h2 id="20160601">2016-06-01</h2>
<ul> <ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
@ -223,18 +212,15 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-05-01">2016-05-01</h2> <h2 id="20160501">2016-05-01</h2>
<ul> <ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> <li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li> <li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168 3168
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
</article> </article>
@ -252,13 +238,12 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-04-04">2016-04-04</h2> <h2 id="20160404">2016-04-04</h2>
<ul> <ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
@ -278,11 +263,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-03-02">2016-03-02</h2> <h2 id="20160302">2016-03-02</h2>
<ul> <ul>
<li>Looking at issues with author authorities on CGSpace</li> <li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a>
@ -302,16 +286,13 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-02-05">2016-02-05</h2> <h2 id="20160205">2016-02-05</h2>
<ul> <ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li> <li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> <li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> <li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p>
<ul> <ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
@ -333,8 +314,7 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-01-13">2016-01-13</h2> <h2 id="20160113">2016-01-13</h2>
<ul> <ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
@ -357,18 +337,16 @@ text_value
</p> </p>
</header> </header>
<h2 id="2015-12-02">2015-12-02</h2> <h2 id="20151202">2015-12-02</h2>
<ul> <ul>
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log <pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2015-11-22">2015-11-22</h2> <h2 id="20151122">2015-11-22</h2>
<ul> <ul>
<li>CGSpace went down</li> <li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> <li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78 78
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
</article> </article>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGIAR Library Migration"/> <meta name="twitter:title" content="CGIAR Library Migration"/>
<meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/> <meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -25,7 +25,7 @@
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "CGIAR Library Migration", "headline": "CGIAR Library Migration",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/cgiar-library-migration\/", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/cgiar-library-migration\/",
"wordCount": "1285", "wordCount": "1278",
"datePublished": "2017-09-18T16:38:35+03:00", "datePublished": "2017-09-18T16:38:35+03:00",
"dateModified": "2019-10-28T13:40:20+02:00", "dateModified": "2019-10-28T13:40:20+02:00",
"author": { "author": {
@ -100,47 +100,38 @@
</p> </p>
</header> </header>
<p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p>
<h2 id="premigration-technical-todos">Pre-migration Technical TODOs</h2>
<h2 id="pre-migration-technical-todos">Pre-migration Technical TODOs</h2>
<p>Things that need to happen before the migration:</p> <p>Things that need to happen before the migration:</p>
<ul>
<ul class="task-list"> <li><input checked="" disabled="" type="checkbox">Create top-level community on CGSpace to hold the CGIAR Library content: <code>10568/83389</code>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create top-level community on CGSpace to hold the CGIAR Library content: <code>10568/83389</code> <ul>
<li><input checked="" disabled="" type="checkbox">Update nginx redirects in ansible templates</li>
<ul class="task-list"> <li><input checked="" disabled="" type="checkbox">Update handle in DSpace XMLUI config</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Update nginx redirects in ansible templates</label></li> </ul>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Update handle in DSpace XMLUI config</label></li> </li>
</ul></label></li>
<li>Set up nginx redirects for URLs like: <li>Set up nginx redirects for URLs like:
<ul>
<ul class="task-list"> <li><input checked="" disabled="" type="checkbox"><a href="https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf">https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf</a></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> <a href="https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf">https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf</a></label></li> <li><input checked="" disabled="" type="checkbox"><a href="https://library.cgiar.org/handle/10947/4258">https://library.cgiar.org/handle/10947/4258</a></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> <a href="https://library.cgiar.org/handle/10947/4258">https://library.cgiar.org/handle/10947/4258</a></label></li> </ul>
</ul></li> </li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Merge <a href="https://github.com/ilri/DSpace/pull/339">#339</a> to <code>5_x-prod</code> branch and rebuild DSpace</label></li> <li><input checked="" disabled="" type="checkbox">Merge <a href="https://github.com/ilri/DSpace/pull/339">#339</a> to <code>5_x-prod</code> branch and rebuild DSpace</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Increase <code>max_connections</code> in <code>/etc/postgresql/9.5/main/postgresql.conf</code> by ~10 <li><input checked="" disabled="" type="checkbox">Increase <code>max_connections</code> in <code>/etc/postgresql/9.5/main/postgresql.conf</code> by ~10
<ul> <ul>
<li><code>SELECT * FROM pg_stat_activity;</code> seems to show ~6 extra connections used by the command line tools during import</li> <li><code>SELECT * FROM pg_stat_activity;</code> seems to show ~6 extra connections used by the command line tools during import</li>
</ul></label></li> </ul>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Temporarily disable nightly <code>index-discovery</code> cron job because the import process will be taking place during some of this time and I don&rsquo;t want them to be competing to update the Solr index</label></li> </li>
<li><input checked="" disabled="" type="checkbox">Temporarily disable nightly <code>index-discovery</code> cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index</li>
<li><p>[x] Copy HTTPS certificate key pair from CGIAR Library server&rsquo;s Tomcat keystore:</p> <li><input checked="" disabled="" type="checkbox">Copy HTTPS certificate key pair from CGIAR Library server's Tomcat keystore:</li>
</ul>
<pre><code>$ keytool -list -keystore tomcat.keystore <pre><code>$ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat $ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem $ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem $ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem $ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem $ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem
</code></pre></li> </code></pre><h2 id="migration-process">Migration Process</h2>
</ul>
<h2 id="migration-process">Migration Process</h2>
<p><strong>Export all top-level communities and collections from DSpace Test:</strong></p> <p><strong>Export all top-level communities and collections from DSpace Test:</strong></p>
<pre><code>$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin <pre><code>$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2516 10947-2516/10947-2516.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2516 10947-2516/10947-2516.zip
@ -154,21 +145,16 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2527 10947-2527/10947
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93759 10568-93759/10568-93759.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93759 10568-93759/10568-93759.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/10568-93760.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/10568-93760.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
</code></pre> </code></pre><p><strong>Import to CGSpace (also see <a href="http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10">notes from 2017-05-10</a>):</strong></p>
<ul>
<p><strong>Import to CGSpace (also see <a href="http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10">notes from 2017-05-10</a>):</strong></p> <li><input checked="" disabled="" type="checkbox">Copy all exports from DSpace Test</li>
<li><input checked="" disabled="" type="checkbox">Add ingestion overrides to <code>dspace.cfg</code> before import:</li>
<ul class="task-list"> </ul>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Copy all exports from DSpace Test</label></li>
<li><p>[x] Add ingestion overrides to <code>dspace.cfg</code> before import:</p>
<pre><code>mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL <pre><code>mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
</code></pre></li> </code></pre><ul>
<li><input checked="" disabled="" type="checkbox">Import communities and collections, paying attention to options to skip missing parents and ignore handles:</li>
<li><p>[x] Import communities and collections, paying attention to options to skip missing parents and ignore handles:</p> </ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin $ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
@ -185,65 +171,45 @@ $ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aor
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip $ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done $ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done $ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre></li> </code></pre><p>This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped. The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes.</p>
</ul>
<p>This submits AIP hierarchies recursively (-r) and suppresses errors when an item&rsquo;s parent collection hasn&rsquo;t been created yet—for example, if the item is mapped. The large historic archive (<sup>10947</sup>&frasl;<sub>1</sub>) is created in several steps because it requires a lot of memory and often crashes.</p>
<p><strong>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</strong></p> <p><strong>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</strong></p>
<ul>
<ul class="task-list"> <li><input checked="" disabled="" type="checkbox">Create <em>CGIAR System Management Board</em> sub-community: <code>10568/83536</code>
<li><p>[x] Create <em>CGIAR System Management Board</em> sub-community: <code>10568/83536</code></p> <ul>
<li><input checked="" disabled="" type="checkbox">Content from <em>CGIAR System Management Board documents</em> collection (<code>10947/4561</code>) goes here</li>
<ul class="task-list"> <li>Import collection hierarchy first and then the items:</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Content from <em>CGIAR System Management Board documents</em> collection (<code>10947/4561</code>) goes here</label></li> </ul>
</li>
<li><p>Import collection hierarchy first and then the items:</p> </ul>
<pre><code>$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip <pre><code>$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre></li> </code></pre><ul>
</ul></li> <li><input checked="" disabled="" type="checkbox">Create <em>CGIAR System Management Office</em> sub-community: <code>10568/83537</code>
<li><p>[x] Create <em>CGIAR System Management Office</em> sub-community: <code>10568/83537</code></p>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office documents</em> collection: <code>10568/83538</code></label></li>
<li><p>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</p>
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
</code></pre></li>
</ul></li>
</ul>
<p><strong>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:</strong></p>
<pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
</code></pre>
<ul> <ul>
<li><p>Export them from the CGIAR Library:</p> <li><input checked="" disabled="" type="checkbox">Create <em>CGIAR System Management Office documents</em> collection: <code>10568/83538</code></li>
<li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</li>
<pre><code># for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done </ul>
</code></pre></li> </li>
</ul>
<li><p>Import on CGSpace:</p> <pre><code>$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
</code></pre><p><strong>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:</strong></p>
<pre><code>$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done <pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
</code></pre></li> </code></pre><ul>
<li>Export them from the CGIAR Library:</li>
</ul>
<pre><code># for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
</code></pre><ul>
<li>Import on CGSpace:</li>
</ul>
<pre><code>$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre><h2 id="post-migration">Post Migration</h2>
<ul>
<li><input checked="" disabled="" type="checkbox">Shut down Tomcat and run <code>update-sequences.sql</code> as the system's <code>postgres</code> user</li>
<li><input checked="" disabled="" type="checkbox">Remove ingestion overrides from <code>dspace.cfg</code></li>
<li><input checked="" disabled="" type="checkbox">Reset PostgreSQL <code>max_connections</code> to 183</li>
<li><input checked="" disabled="" type="checkbox">Enable nightly <code>index-discovery</code> cron job</li>
<li><input checked="" disabled="" type="checkbox">Adjust CGSpace's <code>handle-server/config.dct</code> to add the new prefix alongside our existing 10568, ie:</li>
</ul> </ul>
<h2 id="post-migration">Post Migration</h2>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Shut down Tomcat and run <code>update-sequences.sql</code> as the system&rsquo;s <code>postgres</code> user</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Remove ingestion overrides from <code>dspace.cfg</code></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Reset PostgreSQL <code>max_connections</code> to 183</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Enable nightly <code>index-discovery</code> cron job</label></li>
<li><p>[x] Adjust CGSpace&rsquo;s <code>handle-server/config.dct</code> to add the new prefix alongside our existing 10568, ie:</p>
<pre><code>&quot;server_admins&quot; = ( <pre><code>&quot;server_admins&quot; = (
&quot;300:0.NA/10568&quot; &quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot; &quot;300:0.NA/10947&quot;
@ -258,54 +224,33 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
&quot;300:0.NA/10568&quot; &quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot; &quot;300:0.NA/10947&quot;
) )
</code></pre></li> </code></pre><p>I had been regenerated the <code>sitebndl.zip</code> file on the CGIAR Library server and sent it to the Handle.net admins but they said that there were mismatches between the public and private keys, which I suspect is due to <code>make-handle-config</code> not being very flexible. After discussing our scenario with the Handle.net admins they said we actually don't need to send an updated <code>sitebndl.zip</code> for this type of change, and the above <code>config.dct</code> edits are all that is required. I guess they just did something on their end by setting the authoritative IP address for the 10947 prefix to be the same as ours&hellip;</p>
</ul> <ul>
<li><input checked="" disabled="" type="checkbox">Update DNS records:
<p>I had been regenerated the <code>sitebndl.zip</code> file on the CGIAR Library server and sent it to the Handle.net admins but they said that there were mismatches between the public and private keys, which I suspect is due to <code>make-handle-config</code> not being very flexible. After discussing our scenario with the Handle.net admins they said we actually don&rsquo;t need to send an updated <code>sitebndl.zip</code> for this type of change, and the above <code>config.dct</code> edits are all that is required. I guess they just did something on their end by setting the authoritative IP address for the 10947 prefix to be the same as ours&hellip;</p>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Update DNS records:
<ul> <ul>
<li>CNAME: cgspace.cgiar.org</li> <li>CNAME: cgspace.cgiar.org</li>
</ul></label></li> </ul>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</label></li> </li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</label></li> <li><input checked="" disabled="" type="checkbox">Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Run system updates and reboot server</label></li> <li><input checked="" disabled="" type="checkbox">Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</li>
<li><input checked="" disabled="" type="checkbox">Run system updates and reboot server</li>
<li><p>[x] Switch to Let&rsquo;s Encrypt HTTPS certificates (after DNS is updated and server isn&rsquo;t busy):</p> <li><input checked="" disabled="" type="checkbox">Switch to Let's Encrypt HTTPS certificates (after DNS is updated and server isn't busy):</li>
</ul>
<pre><code>$ sudo systemctl stop nginx <pre><code>$ sudo systemctl stop nginx
$ /opt/certbot-auto certonly --standalone -d library.cgiar.org $ /opt/certbot-auto certonly --standalone -d library.cgiar.org
$ sudo systemctl start nginx $ sudo systemctl start nginx
</code></pre></li> </code></pre><h2 id="troubleshooting">Troubleshooting</h2>
</ul>
<h2 id="troubleshooting">Troubleshooting</h2>
<h3 id="foreign-key-error-in-dspace-cleanup">Foreign Key Error in <code>dspace cleanup</code></h3> <h3 id="foreign-key-error-in-dspace-cleanup">Foreign Key Error in <code>dspace cleanup</code></h3>
<p>The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with <code>dspace cleanup -v</code>:</p> <p>The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with <code>dspace cleanup -v</code>:</p>
<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot; <pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(119841) is still referenced from table &quot;bundle&quot;. Detail: Key (bitstream_id)=(119841) is still referenced from table &quot;bundle&quot;.
</code></pre> </code></pre><p>The solution is to set the <code>primary_bitstream_id</code> to NULL in PostgreSQL:</p>
<p>The solution is to set the <code>primary_bitstream_id</code> to NULL in PostgreSQL:</p>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841); <pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841);
</code></pre> </code></pre><h3 id="psqlexception-during-aip-ingest">PSQLException During AIP Ingest</h3>
<h3 id="psqlexception-during-aip-ingest">PSQLException During AIP Ingest</h3>
<p>After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):</p> <p>After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):</p>
<pre><code>org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint &quot;handle_pkey&quot; <pre><code>org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint &quot;handle_pkey&quot;
Detail: Key (handle_id)=(86227) already exists. Detail: Key (handle_id)=(86227) already exists.
</code></pre> </code></pre><p>The normal solution is to run the <code>update-sequences.sql</code> script (with Tomcat shut down) but it doesn't seem to work in this case. Finding the maximum <code>handle_id</code> and manually updating the sequence seems to work:</p>
<p>The normal solution is to run the <code>update-sequences.sql</code> script (with Tomcat shut down) but it doesn&rsquo;t seem to work in this case. Finding the maximum <code>handle_id</code> and manually updating the sequence seems to work:</p>
<pre><code>dspace=# select * from handle where handle_id=(select max(handle_id) from handle); <pre><code>dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
dspace=# select setval('handle_seq',86873); dspace=# select setval('handle_seq',86873);
</code></pre> </code></pre>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace CG Core v2 Migration"/> <meta name="twitter:title" content="CGSpace CG Core v2 Migration"/>
<meta name="twitter:description" content="Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2."/> <meta name="twitter:description" content="Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,71 +100,64 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<ul> <ul>
<li><a href="#proposed-changes">Proposed Changes</a></li> <li><a href="#proposed-changes">Proposed Changes</a></li>
<li><a href="#fields-to-create">Fields to Create</a></li> <li><a href="#fields-to-create">Fields to Create</a></li>
<li><a href="#fields-to-delete">Fields to Delete</a></li> <li><a href="#fields-to-delete">Fields to Delete</a></li>
<li><a href="#implementation-progress">Implementation Progress</a></li> <li><a href="#implementation-progress">Implementation Progress</a></li>
</ul> </ul>
<h2 id="proposed-changes">Proposed Changes</h2> <h2 id="proposed-changes">Proposed Changes</h2>
<p>As of 2019-11-17 the scope of the changes includes the following fields:</p> <p>As of 2019-11-17 the scope of the changes includes the following fields:</p>
<ul> <ul>
<li>cg.creator.id→cg.creator.identifier <li>cg.creator.id→cg.creator.identifier
<ul> <ul>
<li>ORCID identifiers</li> <li>ORCID identifiers</li>
</ul></li> </ul>
</li>
<li>dc.format.extent→dcterms.extent</li> <li>dc.format.extent→dcterms.extent</li>
<li>dc.date.issued→dcterms.issued</li> <li>dc.date.issued→dcterms.issued</li>
<li>dc.description.abstract→dcterms.abstract</li> <li>dc.description.abstract→dcterms.abstract</li>
<li>dc.description→dcterms.description</li> <li>dc.description→dcterms.description</li>
<li>dc.description.sponsorship→cg.contributor.donor <li>dc.description.sponsorship→cg.contributor.donor
<ul> <ul>
<li>values from CrossRef or Grid.ac if possible</li> <li>values from CrossRef or Grid.ac if possible</li>
</ul></li> </ul>
</li>
<li>dc.description.version→cg.peer-reviewed</li> <li>dc.description.version→cg.peer-reviewed</li>
<li>cg.fulltextstatus→cg.howpublished <li>cg.fulltextstatus→cg.howpublished
<ul> <ul>
<li>CGSpace uses values like &ldquo;Formally Published&rdquo; or &ldquo;Grey Literature&rdquo;</li> <li>CGSpace uses values like &ldquo;Formally Published&rdquo; or &ldquo;Grey Literature&rdquo;</li>
</ul></li> </ul>
</li>
<li>dc.identifier.citation→dcterms.bibliographicCitation</li> <li>dc.identifier.citation→dcterms.bibliographicCitation</li>
<li>cg.identifier.status→dcterms.accessRights <li>cg.identifier.status→dcterms.accessRights
<ul> <ul>
<li>current values are &ldquo;Open Access&rdquo; and &ldquo;Limited Access&rdquo;</li> <li>current values are &ldquo;Open Access&rdquo; and &ldquo;Limited Access&rdquo;</li>
<li>future values are possibly &ldquo;Open&rdquo; and &ldquo;Restricted&rdquo;?</li> <li>future values are possibly &ldquo;Open&rdquo; and &ldquo;Restricted&rdquo;?</li>
</ul></li> </ul>
</li>
<li>dc.language.iso→dcterms.language <li>dc.language.iso→dcterms.language
<ul> <ul>
<li>current values are ISO 639-1 (aka Alpha 2)</li> <li>current values are ISO 639-1 (aka Alpha 2)</li>
<li>future values are possibly ISO 639-3 (aka Alpha 3)?</li> <li>future values are possibly ISO 639-3 (aka Alpha 3)?</li>
</ul></li> </ul>
</li>
<li>cg.link.reference→dcterms.relation</li> <li>cg.link.reference→dcterms.relation</li>
<li>dc.publisher→dcterms.publisher</li> <li>dc.publisher→dcterms.publisher</li>
<li>dc.relation.ispartofseries→dcterms.isPartOf</li> <li>dc.relation.ispartofseries→dcterms.isPartOf</li>
<li>dc.rights→dcterms.license <li>dc.rights→dcterms.license
<ul> <ul>
<li>Using <a href="https://spdx.org/licenses/">SPDX license identifiers</a> if possible</li> <li>Using <a href="https://spdx.org/licenses/">SPDX license identifiers</a> if possible</li>
</ul></li> </ul>
</li>
<li>dc.source→cg.journal</li> <li>dc.source→cg.journal</li>
<li>dc.subject→dcterms.subject</li> <li>dc.subject→dcterms.subject</li>
<li>dc.type→dcterms.type</li> <li>dc.type→dcterms.type</li>
<li>dc.identifier.isbn→cg.isbn</li> <li>dc.identifier.isbn→cg.isbn</li>
<li>dc.identifier.issn→cg.issn</li> <li>dc.identifier.issn→cg.issn</li>
</ul> </ul>
<p>The following fields are currently out of the scope of this migration because they are used internally by DSpace 5.x/6.x and would be difficult to change without significant modifications to the core of the code:</p> <p>The following fields are currently out of the scope of this migration because they are used internally by DSpace 5.x/6.x and would be difficult to change without significant modifications to the core of the code:</p>
<ul> <ul>
<li>dc.title (<code>IncludePageMeta.java</code> only considers DC when building pageMeta, which we rely on in XMLUI because of XSLT from DRI)</li> <li>dc.title (<code>IncludePageMeta.java</code> only considers DC when building pageMeta, which we rely on in XMLUI because of XSLT from DRI)</li>
<li>dc.title.alternative</li> <li>dc.title.alternative</li>
@ -174,36 +167,27 @@
<li>dc.description.provenance</li> <li>dc.description.provenance</li>
<li>dc.contributor.author (<code>IncludePageMeta.java</code> only considers DC when building pageMeta, which we rely on in XMLUI because of XSLT from DRI)</li> <li>dc.contributor.author (<code>IncludePageMeta.java</code> only considers DC when building pageMeta, which we rely on in XMLUI because of XSLT from DRI)</li>
</ul> </ul>
<h2 id="fields-to-create">Fields to Create</h2> <h2 id="fields-to-create">Fields to Create</h2>
<p>Make sure the following fields exist:</p> <p>Make sure the following fields exist:</p>
<ul>
<ul class="task-list"> <li><input checked="" disabled="" type="checkbox">cg.creator.identifier (242)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.creator.identifier (242)</label></li> <li><input checked="" disabled="" type="checkbox">cg.contributor.donor (243)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.contributor.donor (243)</label></li> <li><input checked="" disabled="" type="checkbox">cg.peer-reviewed (244)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.peer-reviewed (244)</label></li> <li><input checked="" disabled="" type="checkbox">cg.howpublished (245)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.howpublished (245)</label></li> <li><input checked="" disabled="" type="checkbox">cg.journal (246)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.journal (246)</label></li> <li><input checked="" disabled="" type="checkbox">cg.isbn (247)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.isbn (247)</label></li> <li><input checked="" disabled="" type="checkbox">cg.issn (248)</li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> cg.issn (248)</label></li>
</ul> </ul>
<h2 id="fields-to-delete">Fields to delete</h2> <h2 id="fields-to-delete">Fields to delete</h2>
<p>Fields to delete after migration:</p> <p>Fields to delete after migration:</p>
<ul>
<ul class="task-list"> <li><input disabled="" type="checkbox">cg.creator.id</li>
<li><label><input type="checkbox" disabled class="task-list-item"> cg.creator.id</label></li> <li><input disabled="" type="checkbox">cg.fulltextstatus</li>
<li><label><input type="checkbox" disabled class="task-list-item"> cg.fulltextstatus</label></li> <li><input disabled="" type="checkbox">cg.identifier.status</li>
<li><label><input type="checkbox" disabled class="task-list-item"> cg.identifier.status</label></li> <li><input disabled="" type="checkbox">cg.link.reference</li>
<li><label><input type="checkbox" disabled class="task-list-item"> cg.link.reference</label></li>
</ul> </ul>
<h2 id="implementation-progress">Implementation Progress</h2> <h2 id="implementation-progress">Implementation Progress</h2>
<p>Tally of the status of the implementation of the new fields in the CGSpace <code>5_x-cgcorev2</code> branch.</p> <p>Tally of the status of the implementation of the new fields in the CGSpace <code>5_x-cgcorev2</code> branch.</p>
<table> <table>
<thead> <thead>
<tr> <tr>
@ -217,7 +201,6 @@
<th align="center">Crosswalks</th> <th align="center">Crosswalks</th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td>cg.creator.identifier</td> <td>cg.creator.identifier</td>
@ -229,7 +212,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.extent</td> <td>dcterms.extent</td>
<td align="center"></td> <td align="center"></td>
@ -240,7 +222,6 @@
<td align="center">-</td> <td align="center">-</td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.issued</td> <td>dcterms.issued</td>
<td align="center"></td> <td align="center"></td>
@ -251,7 +232,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.abstract</td> <td>dcterms.abstract</td>
<td align="center"></td> <td align="center"></td>
@ -262,7 +242,6 @@
<td align="center">-</td> <td align="center">-</td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.description</td> <td>dcterms.description</td>
<td align="center"></td> <td align="center"></td>
@ -273,7 +252,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>cg.contributor.donor</td> <td>cg.contributor.donor</td>
<td align="center"></td> <td align="center"></td>
@ -284,7 +262,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>cg.peer-reviewed</td> <td>cg.peer-reviewed</td>
<td align="center"></td> <td align="center"></td>
@ -295,7 +272,6 @@
<td align="center">-</td> <td align="center">-</td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>cg.howpublished</td> <td>cg.howpublished</td>
<td align="center"></td> <td align="center"></td>
@ -306,7 +282,6 @@
<td align="center">-</td> <td align="center">-</td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.bibliographicCitation</td> <td>dcterms.bibliographicCitation</td>
<td align="center"></td> <td align="center"></td>
@ -317,7 +292,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.accessRights</td> <td>dcterms.accessRights</td>
<td align="center"></td> <td align="center"></td>
@ -328,7 +302,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.language</td> <td>dcterms.language</td>
<td align="center"></td> <td align="center"></td>
@ -339,7 +312,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.relation</td> <td>dcterms.relation</td>
<td align="center"></td> <td align="center"></td>
@ -350,7 +322,6 @@
<td align="center">-</td> <td align="center">-</td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.publisher</td> <td>dcterms.publisher</td>
<td align="center"></td> <td align="center"></td>
@ -361,7 +332,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.isPartOf</td> <td>dcterms.isPartOf</td>
<td align="center"></td> <td align="center"></td>
@ -372,7 +342,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.license</td> <td>dcterms.license</td>
<td align="center"></td> <td align="center"></td>
@ -383,7 +352,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>cg.journal</td> <td>cg.journal</td>
<td align="center"></td> <td align="center"></td>
@ -394,7 +362,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.subject</td> <td>dcterms.subject</td>
<td align="center"></td> <td align="center"></td>
@ -405,7 +372,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>dcterms.type</td> <td>dcterms.type</td>
<td align="center"></td> <td align="center"></td>
@ -416,7 +382,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>cg.isbn</td> <td>cg.isbn</td>
<td align="center"></td> <td align="center"></td>
@ -427,7 +392,6 @@
<td align="center"></td> <td align="center"></td>
<td align="center"></td> <td align="center"></td>
</tr> </tr>
<tr> <tr>
<td>cg.issn</td> <td>cg.issn</td>
<td align="center"></td> <td align="center"></td>
@ -440,19 +404,14 @@
</tr> </tr>
</tbody> </tbody>
</table> </table>
<p>There are a few things that I need to check once I get a deployment of this code up and running:</p> <p>There are a few things that I need to check once I get a deployment of this code up and running:</p>
<ul> <ul>
<li>Assess the XSL changes to see if things like <code>not(@qualifier)]</code> still make sense after we move fields from DC to DCTERMS, as some fields will no longer have qualifiers</li> <li>Assess the XSL changes to see if things like <code>not(@qualifier)]</code> still make sense after we move fields from DC to DCTERMS, as some fields will no longer have qualifiers</li>
<li>Do I need to edit crosswalks that we are not using, like <a href="https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema">MODS</a>?</li> <li>Do I need to edit crosswalks that we are not using, like <a href="https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema">MODS</a>?</li>
<li>There is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see <code>dspace/config/crosswalks/oai/*.xsl</code>)</li> <li>There is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see <code>dspace/config/crosswalks/oai/*.xsl</code>)</li>
</ul> </ul>
<hr>
<hr /> <p>¹ Not committed yet because I don't want to have to make minor adjustments in multiple commits. Re-apply the gauntlet of fixes with the sed script:</p>
<p>¹ Not committed yet because I don&rsquo;t want to have to make minor adjustments in multiple commits. Re-apply the gauntlet of fixes with the sed script:</p>
<pre><code>$ find dspace/modules/xmlui-mirage2/src/main/webapp/themes -iname &quot;*.xsl&quot; -exec sed -i -f ./cgcore-xsl-replacements.sed {} \; <pre><code>$ find dspace/modules/xmlui-mirage2/src/main/webapp/themes -iname &quot;*.xsl&quot; -exec sed -i -f ./cgcore-xsl-replacements.sed {} \;
</code></pre> </code></pre>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,31 +99,27 @@
</p> </p>
</header> </header>
<h2 id="2019-11-04">2019-11-04</h2> <h2 id="20191104">2019-11-04</h2>
<ul> <ul>
<li><p>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics</p> <li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul> <ul>
<li><p>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</p> <li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
</code></pre></li> </code></pre><ul>
</ul></li> <li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let's see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
<li><p>So 4.6 million from XMLUI and another 1.2 million from API requests</p></li> </ul>
<li><p>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</p>
<pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
</article> </article>
@ -145,7 +140,6 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
</article> </article>
@ -164,8 +158,7 @@
</p> </p>
</header> </header>
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c 'id,dc.
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
</article> </article>
@ -183,37 +176,34 @@
</p> </p>
</header> </header>
<h2 id="2019-09-01">2019-09-01</h2> <h2 id="20190901">2019-09-01</h2>
<ul> <ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li> <li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article> </article>
@ -231,22 +221,19 @@
</p> </p>
</header> </header>
<h2 id="2019-08-03">2019-08-03</h2> <h2 id="20190803">2019-08-03</h2>
<ul> <ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li> <li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul> </ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul> <ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li> <li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it <li>Run system updates on CGSpace (linode18) and reboot it
<ul> <ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li> <li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li> <li>After rebooting, all statistics cores were loaded&hellip; wow, that's lucky.</li>
</ul></li> </ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
@ -266,16 +253,15 @@
</p> </p>
</header> </header>
<h2 id="2019-07-01">2019-07-01</h2> <h2 id="20190701">2019-07-01</h2>
<ul> <ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li> <li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: <li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul></li> </ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li> <li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
@ -295,15 +281,12 @@
</p> </p>
</header> </header>
<h2 id="2019-06-02">2019-06-02</h2> <h2 id="20190602">2019-06-02</h2>
<ul> <ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li> <li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li> <li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul> </ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul> <ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li> <li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul> </ul>
@ -324,24 +307,21 @@
</p> </p>
</header> </header>
<h2 id="2019-05-01">2019-05-01</h2> <h2 id="20190501">2019-05-01</h2>
<ul> <ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li> <li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items <li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul> <ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li> <li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li> <li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul></li> </ul>
</li>
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p> <li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648; <pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article> </article>
@ -360,35 +340,30 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2019-04-01">2019-04-01</h2> <h2 id="20190401">2019-04-01</h2>
<ul> <ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc <li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul> <ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li> <li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul></li> </ul>
</li>
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p> <li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul> <ul>
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p> <li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200 4432 200
</code></pre></li> </code></pre><ul>
</ul></li> <li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li> </ul>
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article> </article>
@ -406,20 +381,19 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p> </p>
</header> </header>
<h2 id="2019-03-01">2019-03-01</h2> <h2 id="20190301">2019-03-01</h2>
<ul> <ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li> <li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li> <li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11 <li>Looking at the other half of Udana's WLE records from 2018-11
<ul> <ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li> <li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li> <li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li> <li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li> <li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li> <li>2003<EFBFBD>2013 instead of 20032013</li>
</ul></li> </ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li> <li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>

File diff suppressed because it is too large Load Diff

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,40 +99,34 @@
</p> </p>
</header> </header>
<h2 id="2019-02-01">2019-02-01</h2> <h2 id="20190201">2019-02-01</h2>
<ul> <ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li> <li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
<li><p>The top IPs before, during, and after this latest alert tonight were:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5 245 207.46.13.5
332 54.70.40.11 332 54.70.40.11
385 5.143.231.38 385 5.143.231.38
405 207.46.13.173 405 207.46.13.173
405 207.46.13.75 405 207.46.13.75
1117 66.249.66.219 1117 66.249.66.219
1121 35.237.175.180 1121 35.237.175.180
1546 5.9.6.51 1546 5.9.6.51
2474 45.5.186.2 2474 45.5.186.2
5490 85.25.237.71 5490 85.25.237.71
</code></pre></li> </code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li> <li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li> </ul>
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot; <pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243 3018243
real 0m19.873s real 0m19.873s
user 0m22.203s user 0m22.203s
sys 0m1.979s sys 0m1.979s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article> </article>
@ -151,26 +144,23 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2019-01-02">2019-01-02</h2> <h2 id="20190102">2019-01-02</h2>
<ul> <ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li> <li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article> </article>
@ -188,16 +178,13 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-12-01">2018-12-01</h2> <h2 id="20181201">2018-12-01</h2>
<ul> <ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li> <li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li> <li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li> <li>Then I ran all system updates and restarted the server</li>
</ul> </ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul> <ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li> <li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul> </ul>
@ -218,15 +205,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-11-01">2018-11-01</h2> <h2 id="20181101">2018-11-01</h2>
<ul> <ul>
<li>Finalize AReS Phase I and Phase II ToRs</li> <li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul> </ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul> <ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
@ -248,11 +232,10 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-10-01">2018-10-01</h2> <h2 id="20181001">2018-10-01</h2>
<ul> <ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
</article> </article>
@ -271,13 +254,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-09-02">2018-09-02</h2> <h2 id="20180902">2018-09-02</h2>
<ul> <ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li> <li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>Also, I'll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> <li>I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
</article> </article>
@ -296,27 +278,20 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-08-01">2018-08-01</h2> <h2 id="20180801">2018-08-01</h2>
<ul> <ul>
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p> <li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child <pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li> <li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li> <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
</article> </article>
@ -335,19 +310,16 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-07-01">2018-07-01</h2> <h2 id="20180701">2018-07-01</h2>
<ul> <ul>
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p> <li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre></li>
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre></li>
</ul> </ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
</article> </article>
@ -365,32 +337,27 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-06-04">2018-06-04</h2> <h2 id="20180604">2018-06-04</h2>
<ul> <ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>) <li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul> <ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li> <li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn't build</li>
</ul></li> </ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li> <li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre></li> </code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li> <li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<li><p>Time to index ~70,000 items on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
</article> </article>
@ -408,15 +375,14 @@ sys 2m7.289s
</p> </p>
</header> </header>
<h2 id="2018-05-01">2018-05-01</h2> <h2 id="20180501">2018-05-01</h2>
<ul> <ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: <li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul> <ul>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul></li> </ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul> </ul>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,10 +99,9 @@
</p> </p>
</header> </header>
<h2 id="2018-04-01">2018-04-01</h2> <h2 id="20180401">2018-04-01</h2>
<ul> <ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li> <li>Catalina logs at least show some memory errors yesterday:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a>
@ -123,8 +121,7 @@
</p> </p>
</header> </header>
<h2 id="2018-03-02">2018-03-02</h2> <h2 id="20180302">2018-03-02</h2>
<ul> <ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li> <li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul> </ul>
@ -145,13 +142,12 @@
</p> </p>
</header> </header>
<h2 id="2018-02-01">2018-02-01</h2> <h2 id="20180201">2018-02-01</h2>
<ul> <ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> <li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu's <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a>
</article> </article>
@ -170,33 +166,26 @@
</p> </p>
</header> </header>
<h2 id="2018-01-02">2018-01-02</h2> <h2 id="20180102">2018-01-02</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> <li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> <li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> <li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> <li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
<li><p>And just before that I see this:</p> </ul>
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. <pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre></li> </code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li><p>Ah hah! So the pool was actually empty!</p></li> <li>I need to increase that, let's try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw</li>
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li> <li>I notice this error quite a few times in dspace.log:</li>
</ul>
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
<li><p>I notice this error quite a few times in dspace.log:</p>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets <pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre></li> </code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
<li><p>And there are many of these errors every day for the past month:</p> </ul>
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* <pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
dspace.log.2017-11-21:4 dspace.log.2017-11-21:4
dspace.log.2017-11-22:1 dspace.log.2017-11-22:1
@ -241,9 +230,8 @@ dspace.log.2017-12-30:89
dspace.log.2017-12-31:53 dspace.log.2017-12-31:53
dspace.log.2018-01-01:45 dspace.log.2018-01-01:45
dspace.log.2018-01-02:34 dspace.log.2018-01-02:34
</code></pre></li> </code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
</article> </article>
@ -262,8 +250,7 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-12-01">2017-12-01</h2> <h2 id="20171201">2017-12-01</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down</li> <li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -287,27 +274,22 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-11-01">2017-11-01</h2> <h2 id="20171101">2017-11-01</h2>
<ul> <ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> <li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul> </ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul> <ul>
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p> <li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log <pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
0 0
</code></pre></li> </code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701 COPY 54701
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
</article> </article>
@ -325,17 +307,14 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-10-01">2017-10-01</h2> <h2 id="20171001">2017-10-01</h2>
<ul> <ul>
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p> <li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 <pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre></li> </code></pre><ul>
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
</article> </article>
@ -374,16 +353,13 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-09-06">2017-09-06</h2> <h2 id="20170906">2017-09-06</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul> </ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul> <ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a>
</article> </article>
@ -402,22 +378,21 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-08-01">2017-08-01</h2> <h2 id="20170801">2017-08-01</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> <li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> <li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li> <li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like: <li>But many of the bots are browsing dynamic URLs like:
<ul> <ul>
<li>/handle/10568/3353/discover</li> <li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li> <li>/handle/10568/16510/browse</li>
</ul></li> </ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2017-07-01">2017-07-01</h2> <h2 id="20170701">2017-07-01</h2>
<ul> <ul>
<li>Run system updates and reboot DSpace Test</li> <li>Run system updates and reboot DSpace Test</li>
</ul> </ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul> <ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> <li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a>
</article> </article>
@ -130,7 +126,7 @@
</p> </p>
</header> </header>
2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we'll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.
<a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a>
</article> </article>
@ -148,7 +144,7 @@
</p> </p>
</header> </header>
2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.
<a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a>
</article> </article>
@ -166,23 +162,18 @@
</p> </p>
</header> </header>
<h2 id="2017-04-02">2017-04-02</h2> <h2 id="20170402">2017-04-02</h2>
<ul> <ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> <li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> <li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p>
<ul> <ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li> <li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
</article> </article>
@ -200,14 +191,11 @@
</p> </p>
</header> </header>
<h2 id="2017-03-01">2017-03-01</h2> <h2 id="20170301">2017-03-01</h2>
<ul> <ul>
<li>Run the 279 CIAT author corrections on CGSpace</li> <li>Run the 279 CIAT author corrections on CGSpace</li>
</ul> </ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul> <ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> <li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> <li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -217,13 +205,11 @@
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p> </ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
</article> </article>
@ -241,25 +227,22 @@
</p> </p>
</header> </header>
<h2 id="2017-02-07">2017-02-07</h2> <h2 id="20170207">2017-02-07</h2>
<ul> <ul>
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p> <li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278'; <pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li> <li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article> </article>
@ -278,12 +261,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2017-01-02">2017-01-02</h2> <h2 id="20170102">2017-01-02</h2>
<ul> <ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a>
</article> </article>
@ -302,25 +284,20 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-12-02">2016-12-02</h2> <h2 id="20161202">2016-12-02</h2>
<ul> <ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li> <li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p> </ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) <pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre></li> </code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li> <li>I've raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
</article> </article>
@ -339,13 +316,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-11-01">2016-11-01</h2> <h2 id="20161101">2016-11-01</h2>
<ul> <ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> <li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a>
</article> </article>
@ -363,22 +338,19 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-10-03">2016-10-03</h2> <h2 id="20161003">2016-10-03</h2>
<ul> <ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> <li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected: <li>Need to test the following scenarios to see how author order is affected:
<ul> <ul>
<li>ORCIDs only</li> <li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li> <li>ORCIDs plus normal authors</li>
</ul></li>
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre></li>
</ul> </ul>
</li>
<li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2016-09-01">2016-09-01</h2> <h2 id="20160901">2016-09-01</h2>
<ul> <ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre></li>
</ul> </ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
</article> </article>
@ -129,22 +125,19 @@
</p> </p>
</header> </header>
<h2 id="2016-08-01">2016-08-01</h2> <h2 id="20160801">2016-08-01</h2>
<ul> <ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> <li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> <li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> <li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li> <li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> <li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
<li><p>Start working on DSpace 5.15.5 port:</p> </ul>
<pre><code>$ git checkout -b 55new 5_x-prod <pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
</article> </article>
@ -162,22 +155,19 @@ $ git rebase -i dspace-5.5
</p> </p>
</header> </header>
<h2 id="2016-07-01">2016-07-01</h2> <h2 id="20160701">2016-07-01</h2>
<ul> <ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
<li><p>In this case the select query was showing 95 results before the update</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
</article> </article>
@ -196,11 +186,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-06-01">2016-06-01</h2> <h2 id="20160601">2016-06-01</h2>
<ul> <ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
@ -223,18 +212,15 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-05-01">2016-05-01</h2> <h2 id="20160501">2016-05-01</h2>
<ul> <ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> <li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li> <li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168 3168
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
</article> </article>
@ -252,13 +238,12 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-04-04">2016-04-04</h2> <h2 id="20160404">2016-04-04</h2>
<ul> <ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
@ -278,11 +263,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-03-02">2016-03-02</h2> <h2 id="20160302">2016-03-02</h2>
<ul> <ul>
<li>Looking at issues with author authorities on CGSpace</li> <li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a>
@ -302,16 +286,13 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-02-05">2016-02-05</h2> <h2 id="20160205">2016-02-05</h2>
<ul> <ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li> <li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> <li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> <li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p>
<ul> <ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
@ -333,8 +314,7 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-01-13">2016-01-13</h2> <h2 id="20160113">2016-01-13</h2>
<ul> <ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
@ -357,18 +337,16 @@ text_value
</p> </p>
</header> </header>
<h2 id="2015-12-02">2015-12-02</h2> <h2 id="20151202">2015-12-02</h2>
<ul> <ul>
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log <pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2015-11-22">2015-11-22</h2> <h2 id="20151122">2015-11-22</h2>
<ul> <ul>
<li>CGSpace went down</li> <li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> <li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78 78
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,31 +99,27 @@
</p> </p>
</header> </header>
<h2 id="2019-11-04">2019-11-04</h2> <h2 id="20191104">2019-11-04</h2>
<ul> <ul>
<li><p>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics</p> <li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul> <ul>
<li><p>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</p> <li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
</code></pre></li> </code></pre><ul>
</ul></li> <li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let's see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
<li><p>So 4.6 million from XMLUI and another 1.2 million from API requests</p></li> </ul>
<li><p>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</p>
<pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
</article> </article>
@ -145,7 +140,6 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
</article> </article>
@ -164,8 +158,7 @@
</p> </p>
</header> </header>
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c 'id,dc.
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
</article> </article>
@ -183,37 +176,34 @@
</p> </p>
</header> </header>
<h2 id="2019-09-01">2019-09-01</h2> <h2 id="20190901">2019-09-01</h2>
<ul> <ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li> <li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article> </article>
@ -231,22 +221,19 @@
</p> </p>
</header> </header>
<h2 id="2019-08-03">2019-08-03</h2> <h2 id="20190803">2019-08-03</h2>
<ul> <ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li> <li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul> </ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul> <ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li> <li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it <li>Run system updates on CGSpace (linode18) and reboot it
<ul> <ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li> <li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li> <li>After rebooting, all statistics cores were loaded&hellip; wow, that's lucky.</li>
</ul></li> </ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
@ -266,16 +253,15 @@
</p> </p>
</header> </header>
<h2 id="2019-07-01">2019-07-01</h2> <h2 id="20190701">2019-07-01</h2>
<ul> <ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li> <li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: <li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul></li> </ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li> <li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
@ -295,15 +281,12 @@
</p> </p>
</header> </header>
<h2 id="2019-06-02">2019-06-02</h2> <h2 id="20190602">2019-06-02</h2>
<ul> <ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li> <li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li> <li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul> </ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul> <ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li> <li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul> </ul>
@ -324,24 +307,21 @@
</p> </p>
</header> </header>
<h2 id="2019-05-01">2019-05-01</h2> <h2 id="20190501">2019-05-01</h2>
<ul> <ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li> <li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items <li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul> <ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li> <li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li> <li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul></li> </ul>
</li>
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p> <li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648; <pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article> </article>
@ -360,35 +340,30 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2019-04-01">2019-04-01</h2> <h2 id="20190401">2019-04-01</h2>
<ul> <ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc <li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul> <ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li> <li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul></li> </ul>
</li>
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p> <li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul> <ul>
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p> <li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200 4432 200
</code></pre></li> </code></pre><ul>
</ul></li> <li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li> </ul>
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article> </article>
@ -406,20 +381,19 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p> </p>
</header> </header>
<h2 id="2019-03-01">2019-03-01</h2> <h2 id="20190301">2019-03-01</h2>
<ul> <ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li> <li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li> <li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11 <li>Looking at the other half of Udana's WLE records from 2018-11
<ul> <ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li> <li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li> <li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li> <li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li> <li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li> <li>2003<EFBFBD>2013 instead of 20032013</li>
</ul></li> </ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li> <li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>

File diff suppressed because it is too large Load Diff

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,40 +99,34 @@
</p> </p>
</header> </header>
<h2 id="2019-02-01">2019-02-01</h2> <h2 id="20190201">2019-02-01</h2>
<ul> <ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li> <li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
<li><p>The top IPs before, during, and after this latest alert tonight were:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5 245 207.46.13.5
332 54.70.40.11 332 54.70.40.11
385 5.143.231.38 385 5.143.231.38
405 207.46.13.173 405 207.46.13.173
405 207.46.13.75 405 207.46.13.75
1117 66.249.66.219 1117 66.249.66.219
1121 35.237.175.180 1121 35.237.175.180
1546 5.9.6.51 1546 5.9.6.51
2474 45.5.186.2 2474 45.5.186.2
5490 85.25.237.71 5490 85.25.237.71
</code></pre></li> </code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li> <li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li> </ul>
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot; <pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243 3018243
real 0m19.873s real 0m19.873s
user 0m22.203s user 0m22.203s
sys 0m1.979s sys 0m1.979s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article> </article>
@ -151,26 +144,23 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2019-01-02">2019-01-02</h2> <h2 id="20190102">2019-01-02</h2>
<ul> <ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li> <li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article> </article>
@ -188,16 +178,13 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-12-01">2018-12-01</h2> <h2 id="20181201">2018-12-01</h2>
<ul> <ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li> <li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li> <li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li> <li>Then I ran all system updates and restarted the server</li>
</ul> </ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul> <ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li> <li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul> </ul>
@ -218,15 +205,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-11-01">2018-11-01</h2> <h2 id="20181101">2018-11-01</h2>
<ul> <ul>
<li>Finalize AReS Phase I and Phase II ToRs</li> <li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul> </ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul> <ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
@ -248,11 +232,10 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-10-01">2018-10-01</h2> <h2 id="20181001">2018-10-01</h2>
<ul> <ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
</article> </article>
@ -271,13 +254,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-09-02">2018-09-02</h2> <h2 id="20180902">2018-09-02</h2>
<ul> <ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li> <li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>Also, I'll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> <li>I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
</article> </article>
@ -296,27 +278,20 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-08-01">2018-08-01</h2> <h2 id="20180801">2018-08-01</h2>
<ul> <ul>
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p> <li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child <pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li> <li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li> <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
</article> </article>
@ -335,19 +310,16 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-07-01">2018-07-01</h2> <h2 id="20180701">2018-07-01</h2>
<ul> <ul>
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p> <li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre></li>
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre></li>
</ul> </ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
</article> </article>
@ -365,32 +337,27 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-06-04">2018-06-04</h2> <h2 id="20180604">2018-06-04</h2>
<ul> <ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>) <li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul> <ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li> <li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn't build</li>
</ul></li> </ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li> <li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre></li> </code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li> <li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<li><p>Time to index ~70,000 items on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
</article> </article>
@ -408,15 +375,14 @@ sys 2m7.289s
</p> </p>
</header> </header>
<h2 id="2018-05-01">2018-05-01</h2> <h2 id="20180501">2018-05-01</h2>
<ul> <ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: <li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul> <ul>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul></li> </ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul> </ul>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,10 +99,9 @@
</p> </p>
</header> </header>
<h2 id="2018-04-01">2018-04-01</h2> <h2 id="20180401">2018-04-01</h2>
<ul> <ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li> <li>Catalina logs at least show some memory errors yesterday:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a>
@ -123,8 +121,7 @@
</p> </p>
</header> </header>
<h2 id="2018-03-02">2018-03-02</h2> <h2 id="20180302">2018-03-02</h2>
<ul> <ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li> <li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul> </ul>
@ -145,13 +142,12 @@
</p> </p>
</header> </header>
<h2 id="2018-02-01">2018-02-01</h2> <h2 id="20180201">2018-02-01</h2>
<ul> <ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> <li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu's <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a>
</article> </article>
@ -170,33 +166,26 @@
</p> </p>
</header> </header>
<h2 id="2018-01-02">2018-01-02</h2> <h2 id="20180102">2018-01-02</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> <li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> <li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> <li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> <li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
<li><p>And just before that I see this:</p> </ul>
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. <pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre></li> </code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li><p>Ah hah! So the pool was actually empty!</p></li> <li>I need to increase that, let's try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw</li>
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li> <li>I notice this error quite a few times in dspace.log:</li>
</ul>
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
<li><p>I notice this error quite a few times in dspace.log:</p>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets <pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre></li> </code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
<li><p>And there are many of these errors every day for the past month:</p> </ul>
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* <pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
dspace.log.2017-11-21:4 dspace.log.2017-11-21:4
dspace.log.2017-11-22:1 dspace.log.2017-11-22:1
@ -241,9 +230,8 @@ dspace.log.2017-12-30:89
dspace.log.2017-12-31:53 dspace.log.2017-12-31:53
dspace.log.2018-01-01:45 dspace.log.2018-01-01:45
dspace.log.2018-01-02:34 dspace.log.2018-01-02:34
</code></pre></li> </code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
</article> </article>
@ -262,8 +250,7 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-12-01">2017-12-01</h2> <h2 id="20171201">2017-12-01</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down</li> <li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -287,27 +274,22 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-11-01">2017-11-01</h2> <h2 id="20171101">2017-11-01</h2>
<ul> <ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> <li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul> </ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul> <ul>
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p> <li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log <pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
0 0
</code></pre></li> </code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701 COPY 54701
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
</article> </article>
@ -325,17 +307,14 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-10-01">2017-10-01</h2> <h2 id="20171001">2017-10-01</h2>
<ul> <ul>
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p> <li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 <pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre></li> </code></pre><ul>
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
</article> </article>
@ -374,16 +353,13 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-09-06">2017-09-06</h2> <h2 id="20170906">2017-09-06</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul> </ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul> <ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a>
</article> </article>
@ -402,22 +378,21 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-08-01">2017-08-01</h2> <h2 id="20170801">2017-08-01</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> <li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> <li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li> <li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like: <li>But many of the bots are browsing dynamic URLs like:
<ul> <ul>
<li>/handle/10568/3353/discover</li> <li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li> <li>/handle/10568/16510/browse</li>
</ul></li> </ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2017-07-01">2017-07-01</h2> <h2 id="20170701">2017-07-01</h2>
<ul> <ul>
<li>Run system updates and reboot DSpace Test</li> <li>Run system updates and reboot DSpace Test</li>
</ul> </ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul> <ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> <li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a>
</article> </article>
@ -130,7 +126,7 @@
</p> </p>
</header> </header>
2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we'll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.
<a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a>
</article> </article>
@ -148,7 +144,7 @@
</p> </p>
</header> </header>
2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.
<a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a>
</article> </article>
@ -166,23 +162,18 @@
</p> </p>
</header> </header>
<h2 id="2017-04-02">2017-04-02</h2> <h2 id="20170402">2017-04-02</h2>
<ul> <ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> <li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> <li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p>
<ul> <ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li> <li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
</article> </article>
@ -200,14 +191,11 @@
</p> </p>
</header> </header>
<h2 id="2017-03-01">2017-03-01</h2> <h2 id="20170301">2017-03-01</h2>
<ul> <ul>
<li>Run the 279 CIAT author corrections on CGSpace</li> <li>Run the 279 CIAT author corrections on CGSpace</li>
</ul> </ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul> <ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> <li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> <li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -217,13 +205,11 @@
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p> </ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
</article> </article>
@ -241,25 +227,22 @@
</p> </p>
</header> </header>
<h2 id="2017-02-07">2017-02-07</h2> <h2 id="20170207">2017-02-07</h2>
<ul> <ul>
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p> <li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278'; <pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li> <li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article> </article>
@ -278,12 +261,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2017-01-02">2017-01-02</h2> <h2 id="20170102">2017-01-02</h2>
<ul> <ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a>
</article> </article>
@ -302,25 +284,20 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-12-02">2016-12-02</h2> <h2 id="20161202">2016-12-02</h2>
<ul> <ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li> <li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p> </ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) <pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre></li> </code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li> <li>I've raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
</article> </article>
@ -339,13 +316,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-11-01">2016-11-01</h2> <h2 id="20161101">2016-11-01</h2>
<ul> <ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> <li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a>
</article> </article>
@ -363,22 +338,19 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-10-03">2016-10-03</h2> <h2 id="20161003">2016-10-03</h2>
<ul> <ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> <li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected: <li>Need to test the following scenarios to see how author order is affected:
<ul> <ul>
<li>ORCIDs only</li> <li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li> <li>ORCIDs plus normal authors</li>
</ul></li>
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre></li>
</ul> </ul>
</li>
<li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2016-09-01">2016-09-01</h2> <h2 id="20160901">2016-09-01</h2>
<ul> <ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre></li>
</ul> </ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
</article> </article>
@ -129,22 +125,19 @@
</p> </p>
</header> </header>
<h2 id="2016-08-01">2016-08-01</h2> <h2 id="20160801">2016-08-01</h2>
<ul> <ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> <li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> <li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> <li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li> <li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> <li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
<li><p>Start working on DSpace 5.15.5 port:</p> </ul>
<pre><code>$ git checkout -b 55new 5_x-prod <pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
</article> </article>
@ -162,22 +155,19 @@ $ git rebase -i dspace-5.5
</p> </p>
</header> </header>
<h2 id="2016-07-01">2016-07-01</h2> <h2 id="20160701">2016-07-01</h2>
<ul> <ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
<li><p>In this case the select query was showing 95 results before the update</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
</article> </article>
@ -196,11 +186,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-06-01">2016-06-01</h2> <h2 id="20160601">2016-06-01</h2>
<ul> <ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
@ -223,18 +212,15 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-05-01">2016-05-01</h2> <h2 id="20160501">2016-05-01</h2>
<ul> <ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> <li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li> <li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168 3168
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
</article> </article>
@ -252,13 +238,12 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-04-04">2016-04-04</h2> <h2 id="20160404">2016-04-04</h2>
<ul> <ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
@ -278,11 +263,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-03-02">2016-03-02</h2> <h2 id="20160302">2016-03-02</h2>
<ul> <ul>
<li>Looking at issues with author authorities on CGSpace</li> <li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a>
@ -302,16 +286,13 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-02-05">2016-02-05</h2> <h2 id="20160205">2016-02-05</h2>
<ul> <ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li> <li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> <li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> <li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p>
<ul> <ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
@ -333,8 +314,7 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-01-13">2016-01-13</h2> <h2 id="20160113">2016-01-13</h2>
<ul> <ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
@ -357,18 +337,16 @@ text_value
</p> </p>
</header> </header>
<h2 id="2015-12-02">2015-12-02</h2> <h2 id="20151202">2015-12-02</h2>
<ul> <ul>
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log <pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" /> <meta property="og:updated_time" content="2019-11-04T12:20:30+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2015-11-22">2015-11-22</h2> <h2 id="20151122">2015-11-22</h2>
<ul> <ul>
<li>CGSpace went down</li> <li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> <li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78 78
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
</article> </article>

View File

@ -4,27 +4,27 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2019-11-26T15:53:57+02:00</lastmod> <lastmod>2019-11-27T14:56:00+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-11-26T15:53:57+02:00</lastmod> <lastmod>2019-11-27T14:56:00+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2019-11-26T15:53:57+02:00</lastmod> <lastmod>2019-11-27T14:56:00+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2019-11/</loc> <loc>https://alanorth.github.io/cgspace-notes/2019-11/</loc>
<lastmod>2019-11-26T15:53:57+02:00</lastmod> <lastmod>2019-11-27T14:56:00+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-11-26T15:53:57+02:00</lastmod> <lastmod>2019-11-27T14:56:00+02:00</lastmod>
</url> </url>
<url> <url>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/> <meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,31 +99,27 @@
</p> </p>
</header> </header>
<h2 id="2019-11-04">2019-11-04</h2> <h2 id="20191104">2019-11-04</h2>
<ul> <ul>
<li><p>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics</p> <li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul> <ul>
<li><p>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</p> <li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
4671942 4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot; # zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &quot;[0-9]{1,2}/Oct/2019&quot;
1277694 1277694
</code></pre></li> </code></pre><ul>
</ul></li> <li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let's see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
<li><p>So 4.6 million from XMLUI and another 1.2 million from API requests</p></li> </ul>
<li><p>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</p>
<pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot; <pre><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &quot;[0-9]{1,2}/Oct/2019&quot;
1183456 1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot; # zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781 106781
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
</article> </article>
@ -145,7 +140,6 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
</article> </article>
@ -164,8 +158,7 @@
</p> </p>
</header> </header>
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c 'id,dc.
I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix:
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
</article> </article>
@ -183,37 +176,34 @@
</p> </p>
</header> </header>
<h2 id="2019-09-01">2019-09-01</h2> <h2 id="20190901">2019-09-01</h2>
<ul> <ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li> <li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article> </article>
@ -231,22 +221,19 @@
</p> </p>
</header> </header>
<h2 id="2019-08-03">2019-08-03</h2> <h2 id="20190803">2019-08-03</h2>
<ul> <ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li> <li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul> </ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul> <ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li> <li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it <li>Run system updates on CGSpace (linode18) and reboot it
<ul> <ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li> <li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li> <li>After rebooting, all statistics cores were loaded&hellip; wow, that's lucky.</li>
</ul></li> </ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li> <li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
@ -266,16 +253,15 @@
</p> </p>
</header> </header>
<h2 id="2019-07-01">2019-07-01</h2> <h2 id="20190701">2019-07-01</h2>
<ul> <ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li> <li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace: <li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul> <ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li> <li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li> <li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul></li> </ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li> <li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
@ -295,15 +281,12 @@
</p> </p>
</header> </header>
<h2 id="2019-06-02">2019-06-02</h2> <h2 id="20190602">2019-06-02</h2>
<ul> <ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li> <li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li> <li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul> </ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul> <ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li> <li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul> </ul>
@ -324,24 +307,21 @@
</p> </p>
</header> </header>
<h2 id="2019-05-01">2019-05-01</h2> <h2 id="20190501">2019-05-01</h2>
<ul> <ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li> <li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items <li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul> <ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li> <li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li> <li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul></li> </ul>
</li>
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p> <li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648; <pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article> </article>
@ -360,35 +340,30 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2019-04-01">2019-04-01</h2> <h2 id="20190401">2019-04-01</h2>
<ul> <ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc <li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul> <ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li> <li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul></li> </ul>
</li>
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p> <li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul> <ul>
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p> <li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200 4432 200
</code></pre></li> </code></pre><ul>
</ul></li> <li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li> </ul>
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article> </article>
@ -406,20 +381,19 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p> </p>
</header> </header>
<h2 id="2019-03-01">2019-03-01</h2> <h2 id="20190301">2019-03-01</h2>
<ul> <ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li> <li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li> <li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11 <li>Looking at the other half of Udana's WLE records from 2018-11
<ul> <ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li> <li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li> <li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li> <li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li> <li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li> <li>2003<EFBFBD>2013 instead of 20032013</li>
</ul></li> </ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li> <li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/migration/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/migration/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Migration"/> <meta name="twitter:title" content="Migration"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -88,7 +87,6 @@
</p> </p>
</header> </header>
<p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p> <p>Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.</p>
<p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p> <p>With reference to <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
</article> </article>

View File

@ -18,7 +18,6 @@
<guid>https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/</guid> <guid>https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/</guid>
<description>&lt;p&gt;Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.&lt;/p&gt; <description>&lt;p&gt;Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.&lt;/p&gt;
&lt;p&gt;With reference to &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2 draft standard&lt;/a&gt; by Marie-Angélique as well as &lt;a href=&#34;http://www.dublincore.org/specifications/dublin-core/dcmi-terms/&#34;&gt;DCMI DCTERMS&lt;/a&gt;.&lt;/p&gt;</description> &lt;p&gt;With reference to &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2 draft standard&lt;/a&gt; by Marie-Angélique as well as &lt;a href=&#34;http://www.dublincore.org/specifications/dublin-core/dcmi-terms/&#34;&gt;DCMI DCTERMS&lt;/a&gt;.&lt;/p&gt;</description>
</item> </item>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/notes/" />
<meta property="og:updated_time" content="2017-09-07T16:54:52+07:00" /> <meta property="og:updated_time" content="2017-09-07T16:54:52+07:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -85,16 +84,13 @@
</p> </p>
</header> </header>
<h2 id="2017-09-06">2017-09-06</h2> <h2 id="20170906">2017-09-06</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul> </ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul> <ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a>
</article> </article>
@ -113,22 +109,21 @@
</p> </p>
</header> </header>
<h2 id="2017-08-01">2017-08-01</h2> <h2 id="20170801">2017-08-01</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> <li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> <li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li> <li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like: <li>But many of the bots are browsing dynamic URLs like:
<ul> <ul>
<li>/handle/10568/3353/discover</li> <li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li> <li>/handle/10568/16510/browse</li>
</ul></li> </ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
@ -153,18 +148,15 @@
</p> </p>
</header> </header>
<h2 id="2017-07-01">2017-07-01</h2> <h2 id="20170701">2017-07-01</h2>
<ul> <ul>
<li>Run system updates and reboot DSpace Test</li> <li>Run system updates and reboot DSpace Test</li>
</ul> </ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul> <ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> <li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a>
</article> </article>
@ -183,7 +175,7 @@
</p> </p>
</header> </header>
2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we'll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.
<a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a>
</article> </article>
@ -201,7 +193,7 @@
</p> </p>
</header> </header>
2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.
<a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a>
</article> </article>
@ -219,23 +211,18 @@
</p> </p>
</header> </header>
<h2 id="2017-04-02">2017-04-02</h2> <h2 id="20170402">2017-04-02</h2>
<ul> <ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> <li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> <li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p>
<ul> <ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li> <li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
</article> </article>
@ -253,14 +240,11 @@
</p> </p>
</header> </header>
<h2 id="2017-03-01">2017-03-01</h2> <h2 id="20170301">2017-03-01</h2>
<ul> <ul>
<li>Run the 279 CIAT author corrections on CGSpace</li> <li>Run the 279 CIAT author corrections on CGSpace</li>
</ul> </ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul> <ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> <li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> <li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -270,13 +254,11 @@
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p> </ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
</article> </article>
@ -294,25 +276,22 @@
</p> </p>
</header> </header>
<h2 id="2017-02-07">2017-02-07</h2> <h2 id="20170207">2017-02-07</h2>
<ul> <ul>
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p> <li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278'; <pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li> <li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article> </article>
@ -331,12 +310,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2017-01-02">2017-01-02</h2> <h2 id="20170102">2017-01-02</h2>
<ul> <ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a>
</article> </article>
@ -355,25 +333,20 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-12-02">2016-12-02</h2> <h2 id="20161202">2016-12-02</h2>
<ul> <ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li> <li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p> </ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) <pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre></li> </code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li> <li>I've raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
</article> </article>

View File

@ -17,16 +17,13 @@
<pubDate>Thu, 07 Sep 2017 16:54:52 +0700</pubDate> <pubDate>Thu, 07 Sep 2017 16:54:52 +0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-09/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-09/</guid>
<description>&lt;h2 id=&#34;2017-09-06&#34;&gt;2017-09-06&lt;/h2&gt; <description>&lt;h2 id=&#34;20170906&#34;&gt;2017-09-06&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours&lt;/li&gt; &lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20170907&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;h2 id=&#34;2017-09-07&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Ask Sisay to clean up the WLE approvers a bit, as Marianne&amp;rsquo;s user account is both in the approvers step as well as the group&lt;/li&gt; &lt;li&gt;Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -36,22 +33,21 @@
<pubDate>Tue, 01 Aug 2017 11:51:52 +0300</pubDate> <pubDate>Tue, 01 Aug 2017 11:51:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-08/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-08/</guid>
<description>&lt;h2 id=&#34;2017-08-01&#34;&gt;2017-08-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20170801&#34;&gt;2017-08-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours&lt;/li&gt; &lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours&lt;/li&gt;
&lt;li&gt;I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)&lt;/li&gt; &lt;li&gt;I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)&lt;/li&gt;
&lt;li&gt;The good thing is that, according to &lt;code&gt;dspace.log.2017-08-01&lt;/code&gt;, they are all using the same Tomcat session&lt;/li&gt; &lt;li&gt;The good thing is that, according to &lt;code&gt;dspace.log.2017-08-01&lt;/code&gt;, they are all using the same Tomcat session&lt;/li&gt;
&lt;li&gt;This means our Tomcat Crawler Session Valve is working&lt;/li&gt; &lt;li&gt;This means our Tomcat Crawler Session Valve is working&lt;/li&gt;
&lt;li&gt;But many of the bots are browsing dynamic URLs like: &lt;li&gt;But many of the bots are browsing dynamic URLs like:
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;/handle/10568/3353/discover&lt;/li&gt; &lt;li&gt;/handle/10568/3353/discover&lt;/li&gt;
&lt;li&gt;/handle/10568/16510/browse&lt;/li&gt; &lt;li&gt;/handle/10568/16510/browse&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;robots.txt&lt;/code&gt; only blocks the top-level &lt;code&gt;/discover&lt;/code&gt; and &lt;code&gt;/browse&lt;/code&gt; URLs&amp;hellip; we will need to find a way to forbid them from accessing these!&lt;/li&gt; &lt;li&gt;The &lt;code&gt;robots.txt&lt;/code&gt; only blocks the top-level &lt;code&gt;/discover&lt;/code&gt; and &lt;code&gt;/browse&lt;/code&gt; URLs&amp;hellip; we will need to find a way to forbid them from accessing these!&lt;/li&gt;
&lt;li&gt;Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): &lt;a href=&#34;https://jira.duraspace.org/browse/DS-2962&#34;&gt;https://jira.duraspace.org/browse/DS-2962&lt;/a&gt;&lt;/li&gt; &lt;li&gt;Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): &lt;a href=&#34;https://jira.duraspace.org/browse/DS-2962&#34;&gt;https://jira.duraspace.org/browse/DS-2962&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;It turns out that we&amp;rsquo;re already adding the &lt;code&gt;X-Robots-Tag &amp;quot;none&amp;quot;&lt;/code&gt; HTTP header, but this only forbids the search engine from &lt;em&gt;indexing&lt;/em&gt; the page, not crawling it!&lt;/li&gt; &lt;li&gt;It turns out that we&#39;re already adding the &lt;code&gt;X-Robots-Tag &amp;quot;none&amp;quot;&lt;/code&gt; HTTP header, but this only forbids the search engine from &lt;em&gt;indexing&lt;/em&gt; the page, not crawling it!&lt;/li&gt;
&lt;li&gt;Also, the bot has to successfully browse the page first so it can receive the HTTP header&amp;hellip;&lt;/li&gt; &lt;li&gt;Also, the bot has to successfully browse the page first so it can receive the HTTP header&amp;hellip;&lt;/li&gt;
&lt;li&gt;We might actually have to &lt;em&gt;block&lt;/em&gt; these requests with HTTP 403 depending on the user agent&lt;/li&gt; &lt;li&gt;We might actually have to &lt;em&gt;block&lt;/em&gt; these requests with HTTP 403 depending on the user agent&lt;/li&gt;
&lt;li&gt;Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415&lt;/li&gt; &lt;li&gt;Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415&lt;/li&gt;
@ -67,18 +63,15 @@
<pubDate>Sat, 01 Jul 2017 18:03:52 +0300</pubDate> <pubDate>Sat, 01 Jul 2017 18:03:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-07/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-07/</guid>
<description>&lt;h2 id=&#34;2017-07-01&#34;&gt;2017-07-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20170701&#34;&gt;2017-07-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Run system updates and reboot DSpace Test&lt;/li&gt; &lt;li&gt;Run system updates and reboot DSpace Test&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20170704&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;h2 id=&#34;2017-07-04&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Merge changes for WLE Phase II theme rename (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/329&#34;&gt;#329&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;Merge changes for WLE Phase II theme rename (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/329&#34;&gt;#329&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Looking at extracting the metadata registries from ICARDA&amp;rsquo;s MEL DSpace database so we can compare fields with CGSpace&lt;/li&gt; &lt;li&gt;Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace&lt;/li&gt;
&lt;li&gt;We can use PostgreSQL&amp;rsquo;s extended output format (&lt;code&gt;-x&lt;/code&gt;) plus &lt;code&gt;sed&lt;/code&gt; to format the output into quasi XML:&lt;/li&gt; &lt;li&gt;We can use PostgreSQL&#39;s extended output format (&lt;code&gt;-x&lt;/code&gt;) plus &lt;code&gt;sed&lt;/code&gt; to format the output into quasi XML:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -88,7 +81,7 @@
<pubDate>Thu, 01 Jun 2017 10:14:52 +0300</pubDate> <pubDate>Thu, 01 Jun 2017 10:14:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-06/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-06/</guid>
<description>2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&amp;rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &amp;ldquo;Research Themes&amp;rdquo; community will be renamed to &amp;ldquo;WLE Phase I Research Themes&amp;rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.</description> <description>2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&#39;ll create a new sub-community for Phase II and create collections for the research themes there The current &amp;ldquo;Research Themes&amp;rdquo; community will be renamed to &amp;ldquo;WLE Phase I Research Themes&amp;rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.</description>
</item> </item>
<item> <item>
@ -97,7 +90,7 @@
<pubDate>Mon, 01 May 2017 16:21:52 +0200</pubDate> <pubDate>Mon, 01 May 2017 16:21:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-05/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-05/</guid>
<description>2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&amp;rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&amp;rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.</description> <description>2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&#39;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&#39;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.</description>
</item> </item>
<item> <item>
@ -106,23 +99,18 @@
<pubDate>Sun, 02 Apr 2017 17:08:52 +0200</pubDate> <pubDate>Sun, 02 Apr 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-04/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-04/</guid>
<description>&lt;h2 id=&#34;2017-04-02&#34;&gt;2017-04-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20170402&#34;&gt;2017-04-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Merge one change to CCAFS flagships that I had forgotten to remove last month (&amp;ldquo;MANAGING CLIMATE RISK&amp;rdquo;): &lt;a href=&#34;https://github.com/ilri/DSpace/pull/317&#34;&gt;https://github.com/ilri/DSpace/pull/317&lt;/a&gt;&lt;/li&gt; &lt;li&gt;Merge one change to CCAFS flagships that I had forgotten to remove last month (&amp;ldquo;MANAGING CLIMATE RISK&amp;rdquo;): &lt;a href=&#34;https://github.com/ilri/DSpace/pull/317&#34;&gt;https://github.com/ilri/DSpace/pull/317&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Quick proof-of-concept hack to add &lt;code&gt;dc.rights&lt;/code&gt; to the input form, including some inline instructions/hints:&lt;/li&gt; &lt;li&gt;Quick proof-of-concept hack to add &lt;code&gt;dc.rights&lt;/code&gt; to the input form, including some inline instructions/hints:&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png&#34; alt=&#34;dc.rights in the submission form&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2017/04/dc-rights.png&#34; alt=&#34;dc.rights in the submission form&#34; /&gt;&lt;/p&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Remove redundant/duplicate text in the DSpace submission license&lt;/li&gt; &lt;li&gt;Remove redundant/duplicate text in the DSpace submission license&lt;/li&gt;
&lt;li&gt;Testing the CMYK patch on a collection with 650 items:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Testing the CMYK patch on a collection with 650 items:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &amp;quot;ImageMagick PDF Thumbnail&amp;quot; -v &amp;gt;&amp;amp; /tmp/filter-media-cmyk.txt &lt;pre&gt;&lt;code&gt;$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &amp;quot;ImageMagick PDF Thumbnail&amp;quot; -v &amp;gt;&amp;amp; /tmp/filter-media-cmyk.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -131,14 +119,11 @@
<pubDate>Wed, 01 Mar 2017 17:08:52 +0200</pubDate> <pubDate>Wed, 01 Mar 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-03/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-03/</guid>
<description>&lt;h2 id=&#34;2017-03-01&#34;&gt;2017-03-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20170301&#34;&gt;2017-03-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Run the 279 CIAT author corrections on CGSpace&lt;/li&gt; &lt;li&gt;Run the 279 CIAT author corrections on CGSpace&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;20170302&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-03-02&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace&lt;/li&gt; &lt;li&gt;Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace&lt;/li&gt;
&lt;li&gt;CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles&lt;/li&gt; &lt;li&gt;CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles&lt;/li&gt;
@ -148,13 +133,11 @@
&lt;li&gt;Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI&lt;/li&gt; &lt;li&gt;Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI&lt;/li&gt;
&lt;li&gt;Filed an issue on DSpace issue tracker for the &lt;code&gt;filter-media&lt;/code&gt; bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-3516&#34;&gt;DS-3516&lt;/a&gt;&lt;/li&gt; &lt;li&gt;Filed an issue on DSpace issue tracker for the &lt;code&gt;filter-media&lt;/code&gt; bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-3516&#34;&gt;DS-3516&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Discovered that the ImageMagic &lt;code&gt;filter-media&lt;/code&gt; plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK&lt;/li&gt; &lt;li&gt;Discovered that the ImageMagic &lt;code&gt;filter-media&lt;/code&gt; plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK&lt;/li&gt;
&lt;li&gt;Interestingly, it seems DSpace 4.x&#39;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&#39;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/51999&#34;&gt;10568/51999&lt;/a&gt;):&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interestingly, it seems DSpace 4.x&amp;rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&amp;rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/51999&#34;&gt;&lt;sup&gt;10568&lt;/sup&gt;&amp;frasl;&lt;sub&gt;51999&lt;/sub&gt;&lt;/a&gt;):&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ identify ~/Desktop/alc_contrastes_desafios.jpg &lt;pre&gt;&lt;code&gt;$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -163,25 +146,22 @@
<pubDate>Tue, 07 Feb 2017 07:04:52 -0800</pubDate> <pubDate>Tue, 07 Feb 2017 07:04:52 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-02/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-02/</guid>
<description>&lt;h2 id=&#34;2017-02-07&#34;&gt;2017-02-07&lt;/h2&gt; <description>&lt;h2 id=&#34;20170207&#34;&gt;2017-02-07&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;An item was mapped twice erroneously again, so I had to remove one of the mappings manually:&lt;/p&gt; &lt;li&gt;An item was mapped twice erroneously again, so I had to remove one of the mappings manually:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# select * from collection2item where item_id = &#39;80278&#39;; &lt;pre&gt;&lt;code&gt;dspace=# select * from collection2item where item_id = &#39;80278&#39;;
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Create issue on GitHub to track the addition of CCAFS Phase II project tags (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/301&#34;&gt;#301&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create issue on GitHub to track the addition of CCAFS Phase II project tags (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/301&#34;&gt;#301&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;Looks like we&#39;ll be using &lt;code&gt;cg.identifier.ccafsprojectpii&lt;/code&gt; as the field name&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Looks like we&amp;rsquo;ll be using &lt;code&gt;cg.identifier.ccafsprojectpii&lt;/code&gt; as the field name&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -191,12 +171,11 @@ DELETE 1
<pubDate>Mon, 02 Jan 2017 10:43:00 +0300</pubDate> <pubDate>Mon, 02 Jan 2017 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-01/</guid> <guid>https://alanorth.github.io/cgspace-notes/2017-01/</guid>
<description>&lt;h2 id=&#34;2017-01-02&#34;&gt;2017-01-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20170102&#34;&gt;2017-01-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error&lt;/li&gt; &lt;li&gt;I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error&lt;/li&gt;
&lt;li&gt;I tested on DSpace Test as well and it doesn&amp;rsquo;t work there either&lt;/li&gt; &lt;li&gt;I tested on DSpace Test as well and it doesn&#39;t work there either&lt;/li&gt;
&lt;li&gt;I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&amp;rsquo;m not sure if we&amp;rsquo;ve ever had the sharding task run successfully over all these years&lt;/li&gt; &lt;li&gt;I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&#39;m not sure if we&#39;ve ever had the sharding task run successfully over all these years&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -206,25 +185,20 @@ DELETE 1
<pubDate>Fri, 02 Dec 2016 10:43:00 +0300</pubDate> <pubDate>Fri, 02 Dec 2016 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-12/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-12/</guid>
<description>&lt;h2 id=&#34;2016-12-02&#34;&gt;2016-12-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20161202&#34;&gt;2016-12-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;CGSpace was down for five hours in the morning while I was sleeping&lt;/li&gt; &lt;li&gt;CGSpace was down for five hours in the morning while I was sleeping&lt;/li&gt;
&lt;li&gt;While looking in the logs for errors, I see tons of warnings about Atmire MQM:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;While looking in the logs for errors, I see tons of warnings about Atmire MQM:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;) &lt;pre&gt;&lt;code&gt;2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&amp;quot;dc.title&amp;quot;, transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&amp;quot;dc.title&amp;quot;, transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&amp;quot;THUMBNAIL&amp;quot;, transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&amp;quot;THUMBNAIL&amp;quot;, transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&amp;quot;-1&amp;quot;, transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&amp;quot;-1&amp;quot;, transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&amp;quot;TX157907838689377964651674089851855413607&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;I see thousands of them in the logs for the last few months, so it&#39;s not related to the DSpace 5.5 upgrade&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I see thousands of them in the logs for the last few months, so it&amp;rsquo;s not related to the DSpace 5.5 upgrade&lt;/p&gt;&lt;/li&gt; &lt;li&gt;I&#39;ve raised a ticket with Atmire to ask&lt;/li&gt;
&lt;li&gt;Another worrying error from dspace.log is:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I&amp;rsquo;ve raised a ticket with Atmire to ask&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Another worrying error from dspace.log is:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -234,13 +208,11 @@ DELETE 1
<pubDate>Tue, 01 Nov 2016 09:21:00 +0300</pubDate> <pubDate>Tue, 01 Nov 2016 09:21:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-11/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-11/</guid>
<description>&lt;h2 id=&#34;2016-11-01&#34;&gt;2016-11-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20161101&#34;&gt;2016-11-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.type&lt;/code&gt; to the output options for Atmire&amp;rsquo;s Listings and Reports module (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/286&#34;&gt;#286&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;Add &lt;code&gt;dc.type&lt;/code&gt; to the output options for Atmire&#39;s Listings and Reports module (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/286&#34;&gt;#286&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png&#34; alt=&#34;Listings and Reports with output type&#34;&gt;&lt;/p&gt;</description>
&lt;p&gt;&lt;img src=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png&#34; alt=&#34;Listings and Reports with output type&#34; /&gt;&lt;/p&gt;</description>
</item> </item>
<item> <item>
@ -249,22 +221,19 @@ DELETE 1
<pubDate>Mon, 03 Oct 2016 15:53:00 +0300</pubDate> <pubDate>Mon, 03 Oct 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-10/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-10/</guid>
<description>&lt;h2 id=&#34;2016-10-03&#34;&gt;2016-10-03&lt;/h2&gt; <description>&lt;h2 id=&#34;20161003&#34;&gt;2016-10-03&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Testing adding &lt;a href=&#34;https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing&#34;&gt;ORCIDs to a CSV&lt;/a&gt; file for a single item to see if the author orders get messed up&lt;/li&gt; &lt;li&gt;Testing adding &lt;a href=&#34;https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing&#34;&gt;ORCIDs to a CSV&lt;/a&gt; file for a single item to see if the author orders get messed up&lt;/li&gt;
&lt;li&gt;Need to test the following scenarios to see how author order is affected: &lt;li&gt;Need to test the following scenarios to see how author order is affected:
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;ORCIDs only&lt;/li&gt; &lt;li&gt;ORCIDs only&lt;/li&gt;
&lt;li&gt;ORCIDs plus normal authors&lt;/li&gt; &lt;li&gt;ORCIDs plus normal authors&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt; &lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I exported a random item&amp;rsquo;s metadata as CSV, deleted &lt;em&gt;all columns&lt;/em&gt; except id and collection, and made a new coloum called &lt;code&gt;ORCID:dc.contributor.author&lt;/code&gt; with the following random ORCIDs from the ORCID registry:&lt;/p&gt; &lt;li&gt;I exported a random item&#39;s metadata as CSV, deleted &lt;em&gt;all columns&lt;/em&gt; except id and collection, and made a new coloum called &lt;code&gt;ORCID:dc.contributor.author&lt;/code&gt; with the following random ORCIDs from the ORCID registry:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X &lt;pre&gt;&lt;code&gt;0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -273,18 +242,15 @@ DELETE 1
<pubDate>Thu, 01 Sep 2016 15:53:00 +0300</pubDate> <pubDate>Thu, 01 Sep 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-09/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-09/</guid>
<description>&lt;h2 id=&#34;2016-09-01&#34;&gt;2016-09-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20160901&#34;&gt;2016-09-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors&lt;/li&gt; &lt;li&gt;Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors&lt;/li&gt;
&lt;li&gt;Discuss how the migration of CGIAR&amp;rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace&lt;/li&gt; &lt;li&gt;Discuss how the migration of CGIAR&#39;s Active Directory to a flat structure will break our LDAP groups in DSpace&lt;/li&gt;
&lt;li&gt;We had been using &lt;code&gt;DC=ILRI&lt;/code&gt; to determine whether a user was ILRI or not&lt;/li&gt; &lt;li&gt;We had been using &lt;code&gt;DC=ILRI&lt;/code&gt; to determine whether a user was ILRI or not&lt;/li&gt;
&lt;li&gt;It looks like we might be able to use OUs now, instead of DCs:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It looks like we might be able to use OUs now, instead of DCs:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &amp;quot;dc=cgiarad,dc=org&amp;quot; -D &amp;quot;admigration1@cgiarad.org&amp;quot; -W &amp;quot;(sAMAccountName=admigration1)&amp;quot; &lt;pre&gt;&lt;code&gt;$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &amp;quot;dc=cgiarad,dc=org&amp;quot; -D &amp;quot;admigration1@cgiarad.org&amp;quot; -W &amp;quot;(sAMAccountName=admigration1)&amp;quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -293,22 +259,19 @@ DELETE 1
<pubDate>Mon, 01 Aug 2016 15:53:00 +0300</pubDate> <pubDate>Mon, 01 Aug 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-08/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-08/</guid>
<description>&lt;h2 id=&#34;2016-08-01&#34;&gt;2016-08-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20160801&#34;&gt;2016-08-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Add updated distribution license from Sisay (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/259&#34;&gt;#259&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;Add updated distribution license from Sisay (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/259&#34;&gt;#259&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Play with upgrading Mirage 2 dependencies in &lt;code&gt;bower.json&lt;/code&gt; because most are several versions of out date&lt;/li&gt; &lt;li&gt;Play with upgrading Mirage 2 dependencies in &lt;code&gt;bower.json&lt;/code&gt; because most are several versions of out date&lt;/li&gt;
&lt;li&gt;Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more&lt;/li&gt; &lt;li&gt;Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more&lt;/li&gt;
&lt;li&gt;bower stuff is a dead end, waste of time, too many issues&lt;/li&gt; &lt;li&gt;bower stuff is a dead end, waste of time, too many issues&lt;/li&gt;
&lt;li&gt;Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of &lt;code&gt;fonts&lt;/code&gt;)&lt;/li&gt; &lt;li&gt;Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of &lt;code&gt;fonts&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Start working on DSpace 5.15.5 port:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start working on DSpace 5.15.5 port:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ git checkout -b 55new 5_x-prod &lt;pre&gt;&lt;code&gt;$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -317,22 +280,19 @@ $ git rebase -i dspace-5.5
<pubDate>Fri, 01 Jul 2016 10:53:00 +0300</pubDate> <pubDate>Fri, 01 Jul 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-07/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-07/</guid>
<description>&lt;h2 id=&#34;2016-07-01&#34;&gt;2016-07-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20160701&#34;&gt;2016-07-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.description.sponsorship&lt;/code&gt; to Discovery sidebar facets and make investors clickable in item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/232&#34;&gt;#232&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;Add &lt;code&gt;dc.description.sponsorship&lt;/code&gt; to Discovery sidebar facets and make investors clickable in item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/232&#34;&gt;#232&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;I think this query should find and replace all authors that have &amp;ldquo;,&amp;rdquo; at the end of their names:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I think this query should find and replace all authors that have &amp;ldquo;,&amp;rdquo; at the end of their names:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.+?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.+?,$&#39;; &lt;pre&gt;&lt;code&gt;dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.+?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.+?,$&#39;;
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.+?,$&#39;; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.+?,$&#39;;
text_value text_value
------------ ------------
(0 rows) (0 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;In this case the select query was showing 95 results before the update&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In this case the select query was showing 95 results before the update&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -342,11 +302,10 @@ text_value
<pubDate>Wed, 01 Jun 2016 10:53:00 +0300</pubDate> <pubDate>Wed, 01 Jun 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-06/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-06/</guid>
<description>&lt;h2 id=&#34;2016-06-01&#34;&gt;2016-06-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20160601&#34;&gt;2016-06-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Experimenting with IFPRI OAI (we want to harvest their publications)&lt;/li&gt; &lt;li&gt;Experimenting with IFPRI OAI (we want to harvest their publications)&lt;/li&gt;
&lt;li&gt;After reading the &lt;a href=&#34;https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html&#34;&gt;ContentDM documentation&lt;/a&gt; I found IFPRI&amp;rsquo;s OAI endpoint: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php&#34;&gt;http://ebrary.ifpri.org/oai/oai.php&lt;/a&gt;&lt;/li&gt; &lt;li&gt;After reading the &lt;a href=&#34;https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html&#34;&gt;ContentDM documentation&lt;/a&gt; I found IFPRI&#39;s OAI endpoint: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php&#34;&gt;http://ebrary.ifpri.org/oai/oai.php&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;After reading the &lt;a href=&#34;https://www.openarchives.org/OAI/openarchivesprotocol.html&#34;&gt;OAI documentation&lt;/a&gt; and testing with an &lt;a href=&#34;http://validator.oaipmh.com/&#34;&gt;OAI validator&lt;/a&gt; I found out how to get their publications&lt;/li&gt; &lt;li&gt;After reading the &lt;a href=&#34;https://www.openarchives.org/OAI/openarchivesprotocol.html&#34;&gt;OAI documentation&lt;/a&gt; and testing with an &lt;a href=&#34;http://validator.oaipmh.com/&#34;&gt;OAI validator&lt;/a&gt; I found out how to get their publications&lt;/li&gt;
&lt;li&gt;This is their publications set: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;amp;from=2016-01-01&amp;amp;set=p15738coll2&amp;amp;metadataPrefix=oai_dc&#34;&gt;http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;amp;from=2016-01-01&amp;amp;set=p15738coll2&amp;amp;metadataPrefix=oai_dc&lt;/a&gt;&lt;/li&gt; &lt;li&gt;This is their publications set: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;amp;from=2016-01-01&amp;amp;set=p15738coll2&amp;amp;metadataPrefix=oai_dc&#34;&gt;http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;amp;from=2016-01-01&amp;amp;set=p15738coll2&amp;amp;metadataPrefix=oai_dc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;You can see the others by using the OAI &lt;code&gt;ListSets&lt;/code&gt; verb: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php?verb=ListSets&#34;&gt;http://ebrary.ifpri.org/oai/oai.php?verb=ListSets&lt;/a&gt;&lt;/li&gt; &lt;li&gt;You can see the others by using the OAI &lt;code&gt;ListSets&lt;/code&gt; verb: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php?verb=ListSets&#34;&gt;http://ebrary.ifpri.org/oai/oai.php?verb=ListSets&lt;/a&gt;&lt;/li&gt;
@ -360,18 +319,15 @@ text_value
<pubDate>Sun, 01 May 2016 23:06:00 +0300</pubDate> <pubDate>Sun, 01 May 2016 23:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-05/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-05/</guid>
<description>&lt;h2 id=&#34;2016-05-01&#34;&gt;2016-05-01&lt;/h2&gt; <description>&lt;h2 id=&#34;20160501&#34;&gt;2016-05-01&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Since yesterday there have been 10,000 REST errors and the site has been unstable again&lt;/li&gt; &lt;li&gt;Since yesterday there have been 10,000 REST errors and the site has been unstable again&lt;/li&gt;
&lt;li&gt;I have blocked access to the API now&lt;/li&gt; &lt;li&gt;I have blocked access to the API now&lt;/li&gt;
&lt;li&gt;There are 3,000 IPs accessing the REST API in a 24-hour period!&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are 3,000 IPs accessing the REST API in a 24-hour period!&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l &lt;pre&gt;&lt;code&gt;# awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l
3168 3168
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -380,13 +336,12 @@ text_value
<pubDate>Mon, 04 Apr 2016 11:06:00 +0300</pubDate> <pubDate>Mon, 04 Apr 2016 11:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-04/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-04/</guid>
<description>&lt;h2 id=&#34;2016-04-04&#34;&gt;2016-04-04&lt;/h2&gt; <description>&lt;h2 id=&#34;20160404&#34;&gt;2016-04-04&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit&lt;/li&gt; &lt;li&gt;Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit&lt;/li&gt;
&lt;li&gt;We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc&lt;/li&gt; &lt;li&gt;We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc&lt;/li&gt;
&lt;li&gt;After running DSpace for over five years I&amp;rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!&lt;/li&gt; &lt;li&gt;After running DSpace for over five years I&#39;ve never needed to look in any other log file than dspace.log, leave alone one from last year!&lt;/li&gt;
&lt;li&gt;This will save us a few gigs of backup space we&amp;rsquo;re paying for on S3&lt;/li&gt; &lt;li&gt;This will save us a few gigs of backup space we&#39;re paying for on S3&lt;/li&gt;
&lt;li&gt;Also, I noticed the &lt;code&gt;checker&lt;/code&gt; log has some errors we should pay attention to:&lt;/li&gt; &lt;li&gt;Also, I noticed the &lt;code&gt;checker&lt;/code&gt; log has some errors we should pay attention to:&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -397,11 +352,10 @@ text_value
<pubDate>Wed, 02 Mar 2016 16:50:00 +0300</pubDate> <pubDate>Wed, 02 Mar 2016 16:50:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-03/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-03/</guid>
<description>&lt;h2 id=&#34;2016-03-02&#34;&gt;2016-03-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20160302&#34;&gt;2016-03-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Looking at issues with author authorities on CGSpace&lt;/li&gt; &lt;li&gt;Looking at issues with author authorities on CGSpace&lt;/li&gt;
&lt;li&gt;For some reason we still have the &lt;code&gt;index-lucene-update&lt;/code&gt; cron job active on CGSpace, but I&amp;rsquo;m pretty sure we don&amp;rsquo;t need it as of the latest few versions of Atmire&amp;rsquo;s Listings and Reports module&lt;/li&gt; &lt;li&gt;For some reason we still have the &lt;code&gt;index-lucene-update&lt;/code&gt; cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module&lt;/li&gt;
&lt;li&gt;Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server&lt;/li&gt; &lt;li&gt;Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server&lt;/li&gt;
&lt;/ul&gt;</description> &lt;/ul&gt;</description>
</item> </item>
@ -412,16 +366,13 @@ text_value
<pubDate>Fri, 05 Feb 2016 13:18:00 +0300</pubDate> <pubDate>Fri, 05 Feb 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-02/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-02/</guid>
<description>&lt;h2 id=&#34;2016-02-05&#34;&gt;2016-02-05&lt;/h2&gt; <description>&lt;h2 id=&#34;20160205&#34;&gt;2016-02-05&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Looking at some DAGRIS data for Abenet Yabowork&lt;/li&gt; &lt;li&gt;Looking at some DAGRIS data for Abenet Yabowork&lt;/li&gt;
&lt;li&gt;Lots of issues with spaces, newlines, etc causing the import to fail&lt;/li&gt; &lt;li&gt;Lots of issues with spaces, newlines, etc causing the import to fail&lt;/li&gt;
&lt;li&gt;I noticed we have a very &lt;em&gt;interesting&lt;/em&gt; list of countries on CGSpace:&lt;/li&gt; &lt;li&gt;I noticed we have a very &lt;em&gt;interesting&lt;/em&gt; list of countries on CGSpace:&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png&#34; alt=&#34;CGSpace country list&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/02/cgspace-countries.png&#34; alt=&#34;CGSpace country list&#34; /&gt;&lt;/p&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Not only are there 49,000 countries, we have some blanks (25)&amp;hellip;&lt;/li&gt; &lt;li&gt;Not only are there 49,000 countries, we have some blanks (25)&amp;hellip;&lt;/li&gt;
&lt;li&gt;Also, lots of things like &amp;ldquo;COTE D`LVOIRE&amp;rdquo; and &amp;ldquo;COTE D IVOIRE&amp;rdquo;&lt;/li&gt; &lt;li&gt;Also, lots of things like &amp;ldquo;COTE D`LVOIRE&amp;rdquo; and &amp;ldquo;COTE D IVOIRE&amp;rdquo;&lt;/li&gt;
@ -434,8 +385,7 @@ text_value
<pubDate>Wed, 13 Jan 2016 13:18:00 +0300</pubDate> <pubDate>Wed, 13 Jan 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-01/</guid> <guid>https://alanorth.github.io/cgspace-notes/2016-01/</guid>
<description>&lt;h2 id=&#34;2016-01-13&#34;&gt;2016-01-13&lt;/h2&gt; <description>&lt;h2 id=&#34;20160113&#34;&gt;2016-01-13&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;Move ILRI collection &lt;code&gt;10568/12503&lt;/code&gt; from &lt;code&gt;10568/27869&lt;/code&gt; to &lt;code&gt;10568/27629&lt;/code&gt; using the &lt;a href=&#34;https://gist.github.com/alanorth/392c4660e8b022d99dfa&#34;&gt;move_collections.sh&lt;/a&gt; script I wrote last year.&lt;/li&gt; &lt;li&gt;Move ILRI collection &lt;code&gt;10568/12503&lt;/code&gt; from &lt;code&gt;10568/27869&lt;/code&gt; to &lt;code&gt;10568/27629&lt;/code&gt; using the &lt;a href=&#34;https://gist.github.com/alanorth/392c4660e8b022d99dfa&#34;&gt;move_collections.sh&lt;/a&gt; script I wrote last year.&lt;/li&gt;
&lt;li&gt;I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.&lt;/li&gt; &lt;li&gt;I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.&lt;/li&gt;
@ -449,18 +399,16 @@ text_value
<pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate> <pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-12/</guid> <guid>https://alanorth.github.io/cgspace-notes/2015-12/</guid>
<description>&lt;h2 id=&#34;2015-12-02&#34;&gt;2015-12-02&lt;/h2&gt; <description>&lt;h2 id=&#34;20151202&#34;&gt;2015-12-02&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;&lt;p&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/p&gt; &lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# cd /home/dspacetest.cgiar.org/log &lt;pre&gt;&lt;code&gt;# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
<item> <item>
@ -469,18 +417,15 @@ text_value
<pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate> <pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-11/</guid> <guid>https://alanorth.github.io/cgspace-notes/2015-11/</guid>
<description>&lt;h2 id=&#34;2015-11-22&#34;&gt;2015-11-22&lt;/h2&gt; <description>&lt;h2 id=&#34;20151122&#34;&gt;2015-11-22&lt;/h2&gt;
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt; &lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt; &lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;
&lt;li&gt;Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:&lt;/p&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace &lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78 78
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt; &lt;/code&gt;&lt;/pre&gt;</description>
&lt;/ul&gt;</description>
</item> </item>
</channel> </channel>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/notes/" />
<meta property="og:updated_time" content="2017-09-07T16:54:52+07:00" /> <meta property="og:updated_time" content="2017-09-07T16:54:52+07:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -85,13 +84,11 @@
</p> </p>
</header> </header>
<h2 id="2016-11-01">2016-11-01</h2> <h2 id="20161101">2016-11-01</h2>
<ul> <ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> <li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a>
</article> </article>
@ -109,22 +106,19 @@
</p> </p>
</header> </header>
<h2 id="2016-10-03">2016-10-03</h2> <h2 id="20161003">2016-10-03</h2>
<ul> <ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> <li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected: <li>Need to test the following scenarios to see how author order is affected:
<ul> <ul>
<li>ORCIDs only</li> <li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li> <li>ORCIDs plus normal authors</li>
</ul></li>
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre></li>
</ul> </ul>
</li>
<li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article> </article>
@ -142,18 +136,15 @@
</p> </p>
</header> </header>
<h2 id="2016-09-01">2016-09-01</h2> <h2 id="20160901">2016-09-01</h2>
<ul> <ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre></li>
</ul> </ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
</article> </article>
@ -171,22 +162,19 @@
</p> </p>
</header> </header>
<h2 id="2016-08-01">2016-08-01</h2> <h2 id="20160801">2016-08-01</h2>
<ul> <ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> <li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> <li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> <li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li> <li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> <li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
<li><p>Start working on DSpace 5.15.5 port:</p> </ul>
<pre><code>$ git checkout -b 55new 5_x-prod <pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
</article> </article>
@ -204,22 +192,19 @@ $ git rebase -i dspace-5.5
</p> </p>
</header> </header>
<h2 id="2016-07-01">2016-07-01</h2> <h2 id="20160701">2016-07-01</h2>
<ul> <ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
<li><p>In this case the select query was showing 95 results before the update</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
</article> </article>
@ -238,11 +223,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-06-01">2016-06-01</h2> <h2 id="20160601">2016-06-01</h2>
<ul> <ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
@ -265,18 +249,15 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-05-01">2016-05-01</h2> <h2 id="20160501">2016-05-01</h2>
<ul> <ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> <li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li> <li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168 3168
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
</article> </article>
@ -294,13 +275,12 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-04-04">2016-04-04</h2> <h2 id="20160404">2016-04-04</h2>
<ul> <ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
@ -320,11 +300,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-03-02">2016-03-02</h2> <h2 id="20160302">2016-03-02</h2>
<ul> <ul>
<li>Looking at issues with author authorities on CGSpace</li> <li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a>
@ -344,16 +323,13 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-02-05">2016-02-05</h2> <h2 id="20160205">2016-02-05</h2>
<ul> <ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li> <li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> <li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> <li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p>
<ul> <ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/notes/" />
<meta property="og:updated_time" content="2017-09-07T16:54:52+07:00" /> <meta property="og:updated_time" content="2017-09-07T16:54:52+07:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -85,8 +84,7 @@
</p> </p>
</header> </header>
<h2 id="2016-01-13">2016-01-13</h2> <h2 id="20160113">2016-01-13</h2>
<ul> <ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
@ -109,18 +107,16 @@
</p> </p>
</header> </header>
<h2 id="2015-12-02">2015-12-02</h2> <h2 id="20151202">2015-12-02</h2>
<ul> <ul>
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log <pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article> </article>
@ -138,18 +134,15 @@
</p> </p>
</header> </header>
<h2 id="2015-11-22">2015-11-22</h2> <h2 id="20151122">2015-11-22</h2>
<ul> <ul>
<li>CGSpace went down</li> <li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> <li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78 78
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/> <meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,40 +99,34 @@
</p> </p>
</header> </header>
<h2 id="2019-02-01">2019-02-01</h2> <h2 id="20190201">2019-02-01</h2>
<ul> <ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li> <li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
<li><p>The top IPs before, during, and after this latest alert tonight were:</p> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5 245 207.46.13.5
332 54.70.40.11 332 54.70.40.11
385 5.143.231.38 385 5.143.231.38
405 207.46.13.173 405 207.46.13.173
405 207.46.13.75 405 207.46.13.75
1117 66.249.66.219 1117 66.249.66.219
1121 35.237.175.180 1121 35.237.175.180
1546 5.9.6.51 1546 5.9.6.51
2474 45.5.186.2 2474 45.5.186.2
5490 85.25.237.71 5490 85.25.237.71
</code></pre></li> </code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li> <li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li> </ul>
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot; <pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243 3018243
real 0m19.873s real 0m19.873s
user 0m22.203s user 0m22.203s
sys 0m1.979s sys 0m1.979s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article> </article>
@ -151,26 +144,23 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2019-01-02">2019-01-02</h2> <h2 id="20190102">2019-01-02</h2>
<ul> <ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li> <li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre></li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article> </article>
@ -188,16 +178,13 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-12-01">2018-12-01</h2> <h2 id="20181201">2018-12-01</h2>
<ul> <ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li> <li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li> <li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li> <li>Then I ran all system updates and restarted the server</li>
</ul> </ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul> <ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li> <li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul> </ul>
@ -218,15 +205,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-11-01">2018-11-01</h2> <h2 id="20181101">2018-11-01</h2>
<ul> <ul>
<li>Finalize AReS Phase I and Phase II ToRs</li> <li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul> </ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul> <ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
@ -248,11 +232,10 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-10-01">2018-10-01</h2> <h2 id="20181001">2018-10-01</h2>
<ul> <ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
</article> </article>
@ -271,13 +254,12 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-09-02">2018-09-02</h2> <h2 id="20180902">2018-09-02</h2>
<ul> <ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li> <li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>Also, I'll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> <li>I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
</article> </article>
@ -296,27 +278,20 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-08-01">2018-08-01</h2> <h2 id="20180801">2018-08-01</h2>
<ul> <ul>
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p> <li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child <pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre></li> </code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li> <li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li> <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li> <li>I ran all system updates on DSpace Test and rebooted it</li>
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
</article> </article>
@ -335,19 +310,16 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-07-01">2018-07-01</h2> <h2 id="20180701">2018-07-01</h2>
<ul> <ul>
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p> <li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre></li>
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre></li>
</ul> </ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
</article> </article>
@ -365,32 +337,27 @@ sys 0m1.979s
</p> </p>
</header> </header>
<h2 id="2018-06-04">2018-06-04</h2> <h2 id="20180604">2018-06-04</h2>
<ul> <ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>) <li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul> <ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li> <li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn't build</li>
</ul></li> </ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li> <li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p> </ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n <pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre></li> </code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li> <li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<li><p>Time to index ~70,000 items on CGSpace:</p>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b <pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s real 74m42.646s
user 8m5.056s user 8m5.056s
sys 2m7.289s sys 2m7.289s
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
</article> </article>
@ -408,15 +375,14 @@ sys 2m7.289s
</p> </p>
</header> </header>
<h2 id="2018-05-01">2018-05-01</h2> <h2 id="20180501">2018-05-01</h2>
<ul> <ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface: <li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul> <ul>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li> <li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul></li> </ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul> </ul>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/> <meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,10 +99,9 @@
</p> </p>
</header> </header>
<h2 id="2018-04-01">2018-04-01</h2> <h2 id="20180401">2018-04-01</h2>
<ul> <ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li> <li>Catalina logs at least show some memory errors yesterday:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a>
@ -123,8 +121,7 @@
</p> </p>
</header> </header>
<h2 id="2018-03-02">2018-03-02</h2> <h2 id="20180302">2018-03-02</h2>
<ul> <ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li> <li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul> </ul>
@ -145,13 +142,12 @@
</p> </p>
</header> </header>
<h2 id="2018-02-01">2018-02-01</h2> <h2 id="20180201">2018-02-01</h2>
<ul> <ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li> <li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu's <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a>
</article> </article>
@ -170,33 +166,26 @@
</p> </p>
</header> </header>
<h2 id="2018-01-02">2018-01-02</h2> <h2 id="20180102">2018-01-02</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li> <li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li> <li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li> <li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li> <li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
<li><p>And just before that I see this:</p> </ul>
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000]. <pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre></li> </code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li><p>Ah hah! So the pool was actually empty!</p></li> <li>I need to increase that, let's try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw</li>
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li> <li>I notice this error quite a few times in dspace.log:</li>
</ul>
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
<li><p>I notice this error quite a few times in dspace.log:</p>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets <pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32. org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre></li> </code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
<li><p>And there are many of these errors every day for the past month:</p> </ul>
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.* <pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
dspace.log.2017-11-21:4 dspace.log.2017-11-21:4
dspace.log.2017-11-22:1 dspace.log.2017-11-22:1
@ -241,9 +230,8 @@ dspace.log.2017-12-30:89
dspace.log.2017-12-31:53 dspace.log.2017-12-31:53
dspace.log.2018-01-01:45 dspace.log.2018-01-01:45
dspace.log.2018-01-02:34 dspace.log.2018-01-02:34
</code></pre></li> </code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
</article> </article>
@ -262,8 +250,7 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-12-01">2017-12-01</h2> <h2 id="20171201">2017-12-01</h2>
<ul> <ul>
<li>Uptime Robot noticed that CGSpace went down</li> <li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -287,27 +274,22 @@ dspace.log.2018-01-02:34
</p> </p>
</header> </header>
<h2 id="2017-11-01">2017-11-01</h2> <h2 id="20171101">2017-11-01</h2>
<ul> <ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li> <li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul> </ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul> <ul>
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p> <li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log <pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
0 0
</code></pre></li> </code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p> </ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701 COPY 54701
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
</article> </article>
@ -325,17 +307,14 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-10-01">2017-10-01</h2> <h2 id="20171001">2017-10-01</h2>
<ul> <ul>
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p> <li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 <pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre></li> </code></pre><ul>
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
</article> </article>
@ -374,16 +353,13 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-09-06">2017-09-06</h2> <h2 id="20170906">2017-09-06</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul> </ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul> <ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a>
</article> </article>
@ -402,22 +378,21 @@ COPY 54701
</p> </p>
</header> </header>
<h2 id="2017-08-01">2017-08-01</h2> <h2 id="20170801">2017-08-01</h2>
<ul> <ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li> <li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li> <li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li> <li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li> <li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like: <li>But many of the bots are browsing dynamic URLs like:
<ul> <ul>
<li>/handle/10568/3353/discover</li> <li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li> <li>/handle/10568/16510/browse</li>
</ul></li> </ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/> <meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2017-07-01">2017-07-01</h2> <h2 id="20170701">2017-07-01</h2>
<ul> <ul>
<li>Run system updates and reboot DSpace Test</li> <li>Run system updates and reboot DSpace Test</li>
</ul> </ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul> <ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> <li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a>
</article> </article>
@ -130,7 +126,7 @@
</p> </p>
</header> </header>
2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we'll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.
<a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a>
</article> </article>
@ -148,7 +144,7 @@
</p> </p>
</header> </header>
2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.
<a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a>
</article> </article>
@ -166,23 +162,18 @@
</p> </p>
</header> </header>
<h2 id="2017-04-02">2017-04-02</h2> <h2 id="20170402">2017-04-02</h2>
<ul> <ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li> <li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li> <li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form" /></p>
<ul> <ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li> <li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre></li>
</ul> </ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
</article> </article>
@ -200,14 +191,11 @@
</p> </p>
</header> </header>
<h2 id="2017-03-01">2017-03-01</h2> <h2 id="20170301">2017-03-01</h2>
<ul> <ul>
<li>Run the 279 CIAT author corrections on CGSpace</li> <li>Run the 279 CIAT author corrections on CGSpace</li>
</ul> </ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul> <ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li> <li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li> <li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -217,13 +205,11 @@
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p> </ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
</article> </article>
@ -241,25 +227,22 @@
</p> </p>
</header> </header>
<h2 id="2017-02-07">2017-02-07</h2> <h2 id="20170207">2017-02-07</h2>
<ul> <ul>
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p> <li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278'; <pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id id | collection_id | item_id
-------+---------------+--------- -------+---------------+---------
92551 | 313 | 80278 92551 | 313 | 80278
92550 | 313 | 80278 92550 | 313 | 80278
90774 | 1051 | 80278 90774 | 1051 | 80278
(3 rows) (3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278; dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1 DELETE 1
</code></pre></li> </code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li> <li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article> </article>
@ -278,12 +261,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2017-01-02">2017-01-02</h2> <h2 id="20170102">2017-01-02</h2>
<ul> <ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a>
</article> </article>
@ -302,25 +284,20 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-12-02">2016-12-02</h2> <h2 id="20161202">2016-12-02</h2>
<ul> <ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li> <li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p> </ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) <pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre></li> </code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li> <li>I've raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
</article> </article>
@ -339,13 +316,11 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-11-01">2016-11-01</h2> <h2 id="20161101">2016-11-01</h2>
<ul> <ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> <li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a>
</article> </article>
@ -363,22 +338,19 @@ DELETE 1
</p> </p>
</header> </header>
<h2 id="2016-10-03">2016-10-03</h2> <h2 id="20161003">2016-10-03</h2>
<ul> <ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li> <li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected: <li>Need to test the following scenarios to see how author order is affected:
<ul> <ul>
<li>ORCIDs only</li> <li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li> <li>ORCIDs plus normal authors</li>
</ul></li>
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre></li>
</ul> </ul>
</li>
<li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/> <meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2016-09-01">2016-09-01</h2> <h2 id="20160901">2016-09-01</h2>
<ul> <ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre></li>
</ul> </ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
</article> </article>
@ -129,22 +125,19 @@
</p> </p>
</header> </header>
<h2 id="2016-08-01">2016-08-01</h2> <h2 id="20160801">2016-08-01</h2>
<ul> <ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li> <li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li> <li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li> <li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li> <li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li> <li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
<li><p>Start working on DSpace 5.15.5 port:</p> </ul>
<pre><code>$ git checkout -b 55new 5_x-prod <pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod $ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5 $ git rebase -i dspace-5.5
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
</article> </article>
@ -162,22 +155,19 @@ $ git rebase -i dspace-5.5
</p> </p>
</header> </header>
<h2 id="2016-07-01">2016-07-01</h2> <h2 id="20160701">2016-07-01</h2>
<ul> <ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p> </ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95 UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value text_value
------------ ------------
(0 rows) (0 rows)
</code></pre></li> </code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
<li><p>In this case the select query was showing 95 results before the update</p></li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
</article> </article>
@ -196,11 +186,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-06-01">2016-06-01</h2> <h2 id="20160601">2016-06-01</h2>
<ul> <ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
@ -223,18 +212,15 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-05-01">2016-05-01</h2> <h2 id="20160501">2016-05-01</h2>
<ul> <ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li> <li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li> <li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p> </ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168 3168
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
</article> </article>
@ -252,13 +238,12 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-04-04">2016-04-04</h2> <h2 id="20160404">2016-04-04</h2>
<ul> <ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
@ -278,11 +263,10 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-03-02">2016-03-02</h2> <h2 id="20160302">2016-03-02</h2>
<ul> <ul>
<li>Looking at issues with author authorities on CGSpace</li> <li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul> </ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a>
@ -302,16 +286,13 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-02-05">2016-02-05</h2> <h2 id="20160205">2016-02-05</h2>
<ul> <ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li> <li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li> <li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li> <li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul> </ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list" /></p>
<ul> <ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
@ -333,8 +314,7 @@ text_value
</p> </p>
</header> </header>
<h2 id="2016-01-13">2016-01-13</h2> <h2 id="20160113">2016-01-13</h2>
<ul> <ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
@ -357,18 +337,16 @@ text_value
</p> </p>
</header> </header>
<h2 id="2015-12-02">2015-12-02</h2> <h2 id="20151202">2015-12-02</h2>
<ul> <ul>
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log <pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18* # ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article> </article>

View File

@ -9,13 +9,12 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" /> <meta property="og:updated_time" content="2019-10-28T13:27:35+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/> <meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/> <meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.59.1" /> <meta name="generator" content="Hugo 0.60.0" />
@ -100,18 +99,15 @@
</p> </p>
</header> </header>
<h2 id="2015-11-22">2015-11-22</h2> <h2 id="20151122">2015-11-22</h2>
<ul> <ul>
<li>CGSpace went down</li> <li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li> <li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p> </ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78 78
</code></pre></li> </code></pre>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
</article> </article>